Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Prefix-Tree Decoding for Predicting Mass Spectra from Molecules (2303.06470v3)

Published 11 Mar 2023 in q-bio.QM and cs.LG

Abstract: Computational predictions of mass spectra from molecules have enabled the discovery of clinically relevant metabolites. However, such predictive tools are still limited as they occupy one of two extremes, either operating (a) by fragmenting molecules combinatorially with overly rigid constraints on potential rearrangements and poor time complexity or (b) by decoding lossy and nonphysical discretized spectra vectors. In this work, we use a new intermediate strategy for predicting mass spectra from molecules by treating mass spectra as sets of molecular formulae, which are themselves multisets of atoms. After first encoding an input molecular graph, we decode a set of molecular subformulae, each of which specify a predicted peak in the mass spectrum, the intensities of which are predicted by a second model. Our key insight is to overcome the combinatorial possibilities for molecular subformulae by decoding the formula set using a prefix tree structure, atom-type by atom-type, representing a general method for ordered multiset decoding. We show promising empirical results on mass spectra prediction tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. Optuna: A Next-generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2623–2631, 2019.
  2. Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification. Metabolomics, 11(1):98–110, 2015.
  3. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261, 2018.
  4. Mass spectrometry-based metabolomics in microbiome investigations. Nature Reviews Microbiology, 20(3):143–160, 2022.
  5. Comparison of Cosine, Modified Cosine, and Neutral Loss Based Spectrum Alignment For Discovery of Structurally Related Molecules. Journal of the American Society for Mass Spectrometry, 33(9):1733–1744, 2022a.
  6. The critical role that spectral libraries play in capturing the metabolomics community knowledge. Metabolomics, 18(12):94, 2022b.
  7. Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges. arXiv preprint arXiv:2104.13478, 2021.
  8. Heuristic Dendral: A program for generating explanatory hypotheses. Machine Intelligence, 4:209–254, 1969.
  9. Rearrangement of P-N to P-O bonds in mass spectra of N-diisopropyloxyphosphoryl amino acids/alcohols. Rapid Communications in Mass Spectrometry, 15(20):1936–1940, 2001.
  10. Bo Curry and David E. Rumelhart. MSnet: A neural network which classifies mass spectra. Tetrahedron Computer Methodology, 3(3-4):213–237, 1990.
  11. Fragmentation reactions using electrospray ionization mass spectrometry: an important tool for the structural elucidation and characterization of synthetic and natural products. Natural Product Reports, 33(3):432–455, 2016.
  12. Searching molecular structure databases with tandem mass spectra using CSI:FingerID. Proceedings of the National Academy of Sciences, 112(41):12580–12585, 2015.
  13. SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information. Nature Methods, 16(4):299–302, 2019.
  14. Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra. Nature Biotechnology, 39(4):462–471, 2021.
  15. William Falcon and The PyTorch Lightning team. PyTorch Lightning, 2019.
  16. MetFID: artificial neural network-based compound fingerprint prediction for metabolite annotation. Metabolomics, 16(10):104, 2020.
  17. Prediction of Mass Spectra from Structural Information. Journal of Chemical Information and Computer Sciences, 32(4):264–271, 1992.
  18. Generating Molecular Fragmentation Graphs with Autoregressive Neural Networks. arXiv preprint arXiv:2304.13136, 2023a.
  19. Annotating metabolite mass spectra with domain-inspired chemical formula transformers. Nature Machine Intelligence, 5(9):965–979, 2023b.
  20. William L Hamilton. Graph Representation Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 14(3):1–159, 2020.
  21. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997.
  22. Mad Hatter correctly annotates 98% of small molecule tandem mass spectra searching in PubChem. Metabolites, 13(3):314, 2023.
  23. 3DMolMS: prediction of tandem mass spectra from 3D molecular conformations. Bioinformatics, 39(6):btad354, 2023.
  24. matchms - processing and similarity evaluation of mass spectrometry data. Journal of Open Source Software, 5:2411, 2020.
  25. MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra. Journal of Cheminformatics, 13(1):1–14, 2021.
  26. Computational mass spectrometry for small-molecule fragmentation. TrAC Trends in Analytical Chemistry, 53:41–48, 2014.
  27. NPClassifier: a deep neural network-based structural classification tool for natural products. Journal of Natural Products, 84(11):2795–2807, 2021.
  28. PubChem Substance and Compound databases. Nucleic Acids Research, 44(D1):D1202–D1213, 2016.
  29. Conditional Set Generation with Transformers. In Workshop on Object-Oriented Learning at ICML, 2020.
  30. Greg Landrum. RDKit: Open-source cheminformatics software, 2016.
  31. Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks. In Proceedings of the 36th International Conference on Machine Learning, pages 3744–3753, 2019.
  32. Gated Graph Sequence Neural Networks. In International Conference on Learning Representations, 2016.
  33. Tune: A Research Platform for Distributed Model Selection and Training. In ICML AutoML Workshop, 2018.
  34. Object-centric learning with Slot Attention. In Advances in Neural Information Processing Systems 33, 2020.
  35. Database-independent molecular formula annotation using Gibbs sampling through ZODIAC. Nature Machine Intelligence, 2(10):629–641, 2020.
  36. Efficiently predicting high resolution mass spectra with graph neural networks. In Proceedings of the 40th International Conference on Machine Learning, 2023.
  37. NIST. Tandem Mass Spectral Library, 2020.
  38. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, 2019.
  39. Structure determination of organic compounds. Springer, 2000.
  40. Global chemical effects of the microbiome include new bile-acid conjugations. Nature, 579(7797):123–129, 2020.
  41. Automatic compound annotation from mass spectrometry data using MAGMa. Mass Spectrometry, 3(Spec Iss 2):S0033, 2014.
  42. PROSPECT: Labeled Tandem Mass Spectrometry Dataset for Machine Learning in Proteomics. In Advances in Neural Information Processing Systems 35, 2022.
  43. Georgia Sutherland. Dendral-a computer program for generating and filtering chemical structures. Technical report, Stanford University, Department of Computer Science, 1967.
  44. Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains. In Advances in Neural Information Processing Systems 33, pages 7537–7547, 2020.
  45. Attention Is All You Need. In Advances in Neural Information Processing Systems 30, pages 5998–6008, 2017.
  46. Order Matters: Sequence to sequence for sets. In International Conference on Learning Representations, 2016.
  47. MS2Prop: A machine learning model that directly predicts chemical properties from mass spectrometry data for novel compounds. BioRxiv, 2022.
  48. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nature Biotechnology, 34(8):828–837, 2016.
  49. Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks. arXiv preprint arXiv:1909.01315, 2019.
  50. Rapid Prediction of Electron–Ionization Mass Spectrometry Using Neural Networks. ACS Central Science, 5(4):700–708, 2019.
  51. David S. Wishart. Metabolomics for Investigating Physiological and Pathophysiological Processes. Physiological Reviews, 99(4):1819–1875, 2019.
  52. In silico fragmentation for computer assisted identification of metabolite mass spectra. BMC Bioinformatics, 11(1):148, 2010.
  53. BUDDY: molecular formula discovery via bottom-up MS/MS interrogation. Nature Methods, 20:881–890, 2023.
  54. De novo mass spectrometry peptide sequencing with a transformer model. In Proceedings of the 39th International Conference on Machine Learning, pages 25514–25522, 2022.
  55. MassFormer: Tandem Mass Spectrum Prediction with Graph Transformers. arXiv preprint arXiv:2111.04824, 2021.
  56. Deep Sets. In Advances in Neural Information Processing Systems 30, pages 3391–3401, 2017.
  57. Deep Set Prediction Networks. In Advances in Neural Information Processing Systems 32, 2019.
  58. Using Graph Neural Networks for Mass Spectrometry Prediction. In Machine Learning for Molecules Workshop at NeurIPS, 2020.
  59. Rapid Approximate Subset-Based Spectra Prediction for Electron Ionization–Mass Spectrometry. Analytical Chemistry, 95(5):2653–2663, 2023.
Citations (8)

Summary

We haven't generated a summary for this paper yet.