PeptideForest: Semisupervised machine learning integrating multiple search engines for peptide identification
The first step in bottom-up proteomics is the assignment of measured fragmentation mass spectra to peptide sequences, also known as peptide spectrum matches. In recent years novel algorithms have pushed the assignment to new heights; unfortunately, different algorithms come with different strengths...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article (Journal) |
| Language: | English |
| Published: |
January 22, 2025
|
| In: |
Journal of proteome research
Year: 2025, Volume: 24, Issue: 2, Pages: 929-939 |
| ISSN: | 1535-3907 |
| DOI: | 10.1021/acs.jproteome.4c00686 |
| Online Access: | Verlag, kostenfrei, Volltext: https://doi.org/10.1021/acs.jproteome.4c00686 Verlag, kostenfrei, Volltext: https://pubs.acs.org/doi/10.1021/acs.jproteome.4c00686 |
| Author Notes: | Tristan Ranff, Matthew Dennison, Jeroen Bédorf, Stefan Schulze, Nico Zinn, Marcus Bantscheff, Jasper J.R.M. van Heugten, and Christian Fufezan |
| Summary: | The first step in bottom-up proteomics is the assignment of measured fragmentation mass spectra to peptide sequences, also known as peptide spectrum matches. In recent years novel algorithms have pushed the assignment to new heights; unfortunately, different algorithms come with different strengths and weaknesses and choosing the appropriate algorithm poses a challenge for the user. Here we introduce PeptideForest, a semisupervised machine learning approach that integrates the assignments of multiple algorithms to train a random forest classifier to alleviate that issue. Additionally, PeptideForest increases the number of peptide-to-spectrum matches that exhibit a q-value lower than 1% by 25.2 ± 1.6% compared to MS-GF+ data on samples containing mixed HEK and Escherichia coli proteomes. However, an increase in quantity does not necessarily reflect an increase in quality and this is why we devised a novel approach to determine the quality of the assigned spectra through TMT quantification of samples with known ground truths. Thereby, we could show that the increase in PSMs below 1% q-value does not come with a decrease in quantification quality and as such PeptideForest offers a possibility to gain deeper insights into bottom-up proteomics. PeptideForest has been integrated into our pipeline framework Ursgal and can therefore be combined with a wide array of algorithms. |
|---|---|
| Item Description: | Gesehen am 28.07.2025 |
| Physical Description: | Online Resource |
| ISSN: | 1535-3907 |
| DOI: | 10.1021/acs.jproteome.4c00686 |