Transformer-based de novo peptide sequencing for data-independent acquisition mass spectrometry (2402.11363v3)
Abstract: Tandem mass spectrometry (MS/MS) stands as the predominant high-throughput technique for comprehensively analyzing protein content within biological samples. This methodology is a cornerstone driving the advancement of proteomics. In recent years, substantial strides have been made in Data-Independent Acquisition (DIA) strategies, facilitating impartial and non-targeted fragmentation of precursor ions. The DIA-generated MS/MS spectra present a formidable obstacle due to their inherent high multiplexing nature. Each spectrum encapsulates fragmented product ions originating from multiple precursor peptides. This intricacy poses a particularly acute challenge in de novo peptide/protein sequencing, where current methods are ill-equipped to address the multiplexing conundrum. In this paper, we introduce DiaTrans, a deep-learning model based on transformer architecture. It deciphers peptide sequences from DIA mass spectrometry data. Our results show significant improvements over existing STOA methods, including DeepNovo-DIA and PepNet. Casanovo-DIA enhances precision by 15.14% to 34.8%, recall by 11.62% to 31.94% at the amino acid level, and boosts precision by 59% to 81.36% at the peptide level. Integrating DIA data and our DiaTrans model holds considerable promise to uncover novel peptides and more comprehensive profiling of biological samples. Casanovo-DIA is freely available under the GNU GPL license at https://github.com/Biocomputing-Research-Group/DiaTrans.
- A. L. McCormack, D. M. Schieltz, B. Goode, S. Yang, G. Barnes, D. Drubin, and J. R. Yates, “Direct analysis and identification of proteins in mixtures by lc/ms/ms and database searching at the low-femtomole level,” Analytical chemistry, vol. 69, no. 4, pp. 767–776, 1997.
- C. Fernández-Costa, S. Martínez-Bartolomé, D. B. McClatchy, A. J. Saviola, N.-K. Yu, and J. R. Yates III, “Impact of the identification strategy on the reproducibility of the dda and dia results,” Journal of proteome research, vol. 19, no. 8, pp. 3153–3161, 2020.
- C. L. Hunter, J. Bons, and B. Schilling, “Perspectives and opinions from scientific leaders on the evolution of data-independent acquisition for quantitative proteomics and novel biological applications,” Australian Journal of Chemistry, 2023.
- D. Beslic, G. Tscheuschner, B. Y. Renard, M. G. Weller, and T. Muth, “Comprehensive evaluation of peptide de novo sequencing tools for monoclonal antibody assembly,” Briefings in Bioinformatics, vol. 24, no. 1, p. bbac542, 2023.
- C. Bartels, “Fast algorithm for peptide sequencing by mass spectroscopy,” Biomedical & environmental mass spectrometry, vol. 19, no. 6, pp. 363–368, 1990.
- Y. Yan, S. Zhang, and F.-X. Wu, “Applications of graph theory in protein structure identification,” Proteome science, vol. 9, pp. 1–10, 2011.
- B. Ma, K. Zhang, C. Hendrie, C. Liang, M. Li, A. Doherty-Kirby, and G. Lajoie, “Peaks: powerful software for peptide de novo sequencing by tandem mass spectrometry,” Rapid communications in mass spectrometry, vol. 17, no. 20, pp. 2337–2342, 2003.
- B. Ma, “Novor: real-time peptide de novo sequencing software,” Journal of the American Society for Mass Spectrometry, vol. 26, no. 11, pp. 1885–1894, 2015.
- N. H. Tran, X. Zhang, L. Xin, B. Shan, and M. Li, “De novo peptide sequencing by deep learning,” Proceedings of the National Academy of Sciences, vol. 114, no. 31, pp. 8247–8252, 2017.
- R. Qiao, N. H. Tran, L. Xin, X. Chen, M. Li, B. Shan, and A. Ghodsi, “Computationally instrument-resolution-independent de novo peptide sequencing for high-resolution devices,” Nature Machine Intelligence, vol. 3, no. 5, pp. 420–425, 2021.
- N. H. Tran, R. Qiao, L. Xin, X. Chen, C. Liu, X. Zhang, B. Shan, A. Ghodsi, and M. Li, “Deep learning enables de novo peptide sequencing from data-independent-acquisition mass spectrometry,” Nature methods, vol. 16, no. 1, pp. 63–66, 2019.
- K. Liu, Y. Ye, and H. Tang, “Pepnet: a fully convolutional neural network for de novo peptide sequencing,” ResearchGate.com, 2022.
- Y. Li, “Dpnovo: A deep learning model combined with dynamic programming for de novo peptide sequencing,” Electronic Thesis and Dissertation Repository, 2023.
- K. Karunratanakul, H.-Y. Tang, D. W. Speicher, E. Chuangsuwanich, and S. Sriswasdi, “Uncovering thousands of new peptides with sequence-mask-search hybrid de novo peptide sequencing framework,” Molecular & Cellular Proteomics, vol. 18, no. 12, pp. 2478–2491, 2019.
- H. Yang, H. Chi, W.-F. Zeng, W.-J. Zhou, and S.-M. He, “pnovo 3: precise de novo peptide sequencing using a learning-to-rank framework,” Bioinformatics, vol. 35, no. 14, pp. i183–i190, 2019.
- X.-X. Zhou, W.-F. Zeng, H. Chi, C. Luo, C. Liu, J. Zhan, S.-M. He, and Z. Zhang, “pdeep: predicting ms/ms spectra of peptides with deep learning,” Analytical chemistry, vol. 89, no. 23, pp. 12 690–12 697, 2017.
- M. Yilmaz, W. Fondrie, W. Bittremieux, S. Oh, and W. S. Noble, “De novo mass spectrometry peptide sequencing with a transformer model,” in International Conference on Machine Learning. PMLR, 2022, pp. 25 514–25 522.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- J. Zhang, L. Xin, B. Shan, W. Chen, M. Xie, D. Yuen, W. Zhang, Z. Zhang, G. A. Lajoie, and B. Ma, “Peaks db: de novo sequencing assisted database search for sensitive and accurate peptide identification,” Molecular & cellular proteomics, vol. 11, no. 4, 2012.
- M.-T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attention-based neural machine translation,” arXiv preprint arXiv:1508.04025, 2015.