Dice Question Streamline Icon: https://streamlinehq.com

Chemical formula prediction from mass spectrometry remains partially unsolved

Determine accurate and generalizable methods for predicting molecular formulae from mass spectrometry data, including MS^1 and MS/MS spectra, with particular attention to cases such as fluorine-containing compounds that cannot be derived from MS^1 data alone due to single-stable-isotope limitations.

Information Square Streamline Icon: https://streamlinehq.com

Background

Within the de novo molecule generation task, the benchmark includes a bonus variant where the molecular formula is provided as input, reflecting common practice of deriving formulae from MS1 data. The authors note that, despite high accuracy in many scenarios, chemical formula prediction is not fully solved and remains challenging in specific cases.

They highlight fluorine as an example of an element with only one stable isotope, which prevents accurate derivation of the formula from MS1 alone, underscoring the need for improved or complementary approaches that incorporate MS/MS or other strategies.

References

However, we present this scenario as a bonus challenge because chemical formula prediction remains a partially unsolved problem. For example, elements such as fluorine, which have only one stable isotope, cannot be derived from MS$1$ data alone and still pose challenges with MS/MS data.

MassSpecGym: A benchmark for the discovery and identification of molecules (2410.23326 - Bushuiev et al., 30 Oct 2024) in Section “Definition of the challenges”, De novo molecule generation (Bonus challenge)