PeptideForest: Semisupervised machine learning integrating multiple search engines for peptide identification

The first step in bottom-up proteomics is the assignment of measured fragmentation mass spectra to peptide sequences, also known as peptide spectrum matches. In recent years novel algorithms have pushed the assignment to new heights; unfortunately, different algorithms come with different strengths...

Full description

Saved in:
Bibliographic Details
Main Authors: Ranff, Tristan (Author) , Dennison, Matthew (Author) , Bédorf, Jeroen (Author) , Schulze, Stefan (Author) , Zinn, Nico (Author) , Bantscheff, Marcus (Author) , van Heugten, Jasper J. R. M. (Author) , Fufezan, Christian (Author)
Format: Article (Journal)
Language:English
Published: January 22, 2025
In: Journal of proteome research
Year: 2025, Volume: 24, Issue: 2, Pages: 929-939
ISSN:1535-3907
DOI:10.1021/acs.jproteome.4c00686
Online Access:Verlag, kostenfrei, Volltext: https://doi.org/10.1021/acs.jproteome.4c00686
Verlag, kostenfrei, Volltext: https://pubs.acs.org/doi/10.1021/acs.jproteome.4c00686
Get full text
Author Notes:Tristan Ranff, Matthew Dennison, Jeroen Bédorf, Stefan Schulze, Nico Zinn, Marcus Bantscheff, Jasper J.R.M. van Heugten, and Christian Fufezan
Description
Summary:The first step in bottom-up proteomics is the assignment of measured fragmentation mass spectra to peptide sequences, also known as peptide spectrum matches. In recent years novel algorithms have pushed the assignment to new heights; unfortunately, different algorithms come with different strengths and weaknesses and choosing the appropriate algorithm poses a challenge for the user. Here we introduce PeptideForest, a semisupervised machine learning approach that integrates the assignments of multiple algorithms to train a random forest classifier to alleviate that issue. Additionally, PeptideForest increases the number of peptide-to-spectrum matches that exhibit a q-value lower than 1% by 25.2 ± 1.6% compared to MS-GF+ data on samples containing mixed HEK and Escherichia coli proteomes. However, an increase in quantity does not necessarily reflect an increase in quality and this is why we devised a novel approach to determine the quality of the assigned spectra through TMT quantification of samples with known ground truths. Thereby, we could show that the increase in PSMs below 1% q-value does not come with a decrease in quantification quality and as such PeptideForest offers a possibility to gain deeper insights into bottom-up proteomics. PeptideForest has been integrated into our pipeline framework Ursgal and can therefore be combined with a wide array of algorithms.
Item Description:Gesehen am 28.07.2025
Physical Description:Online Resource
ISSN:1535-3907
DOI:10.1021/acs.jproteome.4c00686