Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AdaNovo: Adaptive \emph{De Novo} Peptide Sequencing with Conditional Mutual Information (2403.07013v2)

Published 9 Mar 2024 in q-bio.QM, cs.LG, and q-bio.BM

Abstract: Tandem mass spectrometry has played a pivotal role in advancing proteomics, enabling the analysis of protein composition in biological samples. Despite the development of various deep learning methods for identifying amino acid sequences (peptides) responsible for observed spectra, challenges persist in \emph{de novo} peptide sequencing. Firstly, prior methods struggle to identify amino acids with post-translational modifications (PTMs) due to their lower frequency in training data compared to canonical amino acids, further resulting in decreased peptide-level identification precision. Secondly, diverse types of noise and missing peaks in mass spectra reduce the reliability of training data (peptide-spectrum matches, PSMs). To address these challenges, we propose AdaNovo, a novel framework that calculates conditional mutual information (CMI) between the spectrum and each amino acid/peptide, using CMI for adaptive model training. Extensive experiments demonstrate AdaNovo's state-of-the-art performance on a 9-species benchmark, where the peptides in the training set are almost completely disjoint from the peptides of the test sets. Moreover, AdaNovo excels in identifying amino acids with PTMs and exhibits robustness against data noise. The supplementary materials contain the official code.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Bartels, C. Fast algorithm for peptide sequencing by mass spectroscopy. Biomedical & environmental mass spectrometry, 19(6):363–368, 1990.
  2. De novo peptide sequencing via tandem mass spectrometry. Journal of computational biology, 6(3-4):327–342, 1999.
  3. Post-translational modifications in signal integration. Nature structural & molecular biology, 17(6):666–672, 2010.
  4. Novohmm: a hidden markov model for de novo peptide sequencing. Analytical chemistry, 77(22):7265–7273, 2005.
  5. Pepnovo: de novo peptide sequencing via probabilistic network modeling. Analytical chemistry, 77(4):964–973, 2005.
  6. Frank, A. M. Predicting intensity ranks of peptide fragment ions. Journal of proteome research, 8(5):2226–2240, 2009.
  7. Uncovering thousands of new peptides with sequence-mask-search hybrid de novo peptide sequencing framework. Molecular & Cellular Proteomics, 18(12):2478–2491, 2019.
  8. Rethinking positional encoding in language pre-training. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=09-528y2Fgf.
  9. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pp.  2980–2988, 2017.
  10. Ma, B. Novor: real-time peptide de novo sequencing software. Journal of the American Society for Mass Spectrometry, 26(11):1885–1894, 2015.
  11. Peaks: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid communications in mass spectrometry, 17(20):2337–2342, 2003.
  12. Immunopeptidomics for next-generation bacterial vaccine development. Trends in microbiology, 29(11):1034–1045, 2021.
  13. Searching for a needle in a stack of needles: challenges in metaproteomics data analysis. Molecular BioSystems, 9(4):578–585, 2013.
  14. Computationally instrument-resolution-independent de novo peptide sequencing for high-resolution devices. Nature Machine Intelligence, 3(5):420–425, 2021.
  15. Learning to reweight examples for robust deep learning. In International conference on machine learning, pp.  4334–4343. PMLR, 2018.
  16. SCIENCE, C.-M. U. P. P. D. O. C. Speech Understanding Systems. Summary of Results of the Five-Year Research Effort at Carnegie-Mellon University. 1977.
  17. Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry. Analytical chemistry, 73(11):2594–2604, 2001.
  18. De novo peptide sequencing by deep learning. Proceedings of the National Academy of Sciences, 114(31):8247–8252, 2017.
  19. Immune repertoire after immunization as seen by next-generation sequencing and proteomics. Frontiers in Immunology, 8:1286, 2017.
  20. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  21. Proteomexchange provides globally coordinated proteomics data submission and dissemination. Nature biotechnology, 32(3):223–226, 2014.
  22. An automated multidimensional protein identification technology for shotgun proteomics. Analytical chemistry, 73(23):5683–5690, 2001.
  23. Wyner, A. D. A definition of conditional mutual information for arbitrary ensembles. Information and Control, 38(1):51–59, 1978.
  24. De novo mass spectrometry peptide sequencing with a transformer model. In International Conference on Machine Learning, pp.  25514–25522. PMLR, 2022.
  25. Deep long-tailed learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com