Generative artificial intelligence for computational chemistry: a roadmap to predicting emergent phenomena (2409.03118v1)
Abstract: The recent surge in Generative AI has introduced exciting possibilities for computational chemistry. Generative AI methods have made significant progress in sampling molecular structures across chemical species, developing force fields, and speeding up simulations. This Perspective offers a structured overview, beginning with the fundamental theoretical concepts in both Generative AI and computational chemistry. It then covers widely used Generative AI methods, including autoencoders, generative adversarial networks, reinforcement learning, flow models and LLMs, and highlights their selected applications in diverse areas including force field development, and protein/RNA structure prediction. A key focus is on the challenges these methods face before they become truly predictive, particularly in predicting emergent chemical phenomena. We believe that the ultimate goal of a simulation method or theory is to predict phenomena not seen before, and that Generative AI should be subject to these same standards before it is deemed useful for chemistry. We suggest that to overcome these challenges, future AI models need to integrate core chemical principles, especially from statistical mechanics.
- \JournalTitleJournal of the American Chemical Society 145, 8736–8750 (2023).
- \JournalTitleNature Machine Intelligence pp. 1–16 (2024).
- GM Rotskoff, Sampling thermodynamic ensembles of molecular systems with generative neural networks: Will integrating physics-based models close the generalization gap? \JournalTitleCurrent Opinion in Solid State and Materials Science 30, 101158 (2024).
- \JournalTitleAnnual Review of Physical Chemistry 75 (2024).
- PW Anderson, More is different: Broken symmetry and the nature of the hierarchical structure of science. \JournalTitleScience 177, 393–396 (1972).
- \JournalTitleAdvances in Neural Information Processing Systems 36 (2024).
- \JournalTitleJournal of Statistical Mechanics: Theory and Experiment 2023, 093402 (2023).
- (Elsevier), (2023).
- AD White, Deep learning for molecules and materials. \JournalTitleLiving Journal of Computational Molecular Science 3, 1499 (2021).
- \JournalTitleCurrent opinion in structural biology 49, 129–138 (2018).
- \JournalTitleProceedings of the National Academy of Sciences 102, 6732–6737 (2005).
- \JournalTitleJournal of Chemical Information and Modeling 63, 4012–4029 (2023).
- (PMLR), pp. 1214–1223 (2018).
- \JournalTitleICLR (Poster) 3 (2017).
- \JournalTitleJournal of Chemical Theory and Computation 20, 3503–3513 (2024).
- \JournalTitleThe Journal of Physical Chemistry B 128, 755–767 (2024).
- \JournalTitleThe Journal of Chemical Physics 154 (2021).
- \JournalTitleThe Journal of Physical Chemistry B 126, 3950–3960 (2022).
- \JournalTitleJournal of Chemical Theory and Computation 19, 9093–9101 (2023).
- (Curran Associates, Inc.), Vol. 27, (2014).
- \JournalTitleArXiv abs/1411.1784 (2014).
- (PMLR), Vol. 70, pp. 214–223 (2017).
- (Curran Associates, Inc.), Vol. 31, (2018).
- \JournalTitleNature Communications 14, 774 (2023).
- \JournalTitleChem. Sci. 11, 9459–9467 (2020).
- \JournalTitleNature Communications 9, 5 (2018).
- \JournalTitleIEEE Transactions on Knowledge and Data Engineering 35, 3313–3332 (2023).
- \JournalTitle2020 International Joint Conference on Neural Networks (IJCNN) pp. 1–10 (2020).
- (2021).
- \JournalTitleJournal of chemical information and modeling 60, 5918–5922 (2020).
- \JournalTitleNature biotechnology 37, 1038–1040 (2019).
- \JournalTitlePhysical Chemistry Chemical Physics 23, 6888–6895 (2021).
- \JournalTitleThe Journal of Chemical Physics 155 (2021).
- \JournalTitleJournal of Chemical Theory and Computation 18, 5422–5434 (2022).
- (Chicago, IL, USA), Vol. 8, pp. 1433–1438 (2008).
- \JournalTitleNature Machine Intelligence pp. 1–11 (2024).
- \JournalTitleAnnual review of physical chemistry 71, 213–238 (2020).
- \JournalTitleThe Journal of Machine Learning Research 24, 10006–10060 (2023).
- (PMLR), pp. 2256–2265 (2015).
- \JournalTitleEurophysics Letters (EPL) 19, 451?458 (1992).
- RM Neal, Annealed importace sampling. \JournalTitleStatistics and Computing 11, 125?139 (2001).
- \JournalTitleAdvances in neural information processing systems 32 (2019).
- \JournalTitleAdvances in Neural Information Processing Systems 33, 5933–5944 (2020).
- \JournalTitleAdvances in neural information processing systems 33, 6840–6851 (2020).
- \JournalTitlearXiv preprint arXiv:2011.13456 (2020).
- BD Anderson, Reverse-time diffusion equation models. \JournalTitleStochastic Processes and their Applications 12, 313?326 (1982).
- \JournalTitleJournal of Chemical Theory and Computation 18, 5759?5791 (2022).
- \JournalTitlearXiv preprint arXiv:2210.02747 (2022).
- \JournalTitlearXiv preprint arXiv:2303.08797 (2023).
- \JournalTitleNeural computation 9, 1735–1780 (1997).
- A Vaswani, Attention is all you need. \JournalTitlearXiv preprint arXiv:1706.03762 (2017).
- pp. 610–623 (2021).
- \JournalTitleNature Communications 13, 7231 (2022).
- \JournalTitleNature Chemistry 12, 891–897 (2020).
- \JournalTitleScience 385, eadn0137 (2024).
- \JournalTitleWIREs Computational Molecular Science 7, e1290 (2017).
- P Tiwary, Modeling prebiotic chemistries with quantum accuracy at classical costs. \JournalTitleProceedings of the National Academy of Sciences 121, e2408742121 (2024).
- \JournalTitleNature communications 14, 5739 (2023).
- \JournalTitleAnnual review of physical chemistry 71, 361–390 (2020).
- \JournalTitlePhysical Review Letters 131, 076801 (2023).
- \JournalTitlePhysical Review Letters 129, 255702 (2022).
- \JournalTitleThe Journal of Physical Chemistry Letters 8, 1476–1483 (2017).
- \JournalTitleProceedings of the National Academy of Sciences 121, e2322040121 (2024).
- \JournalTitleChem. Sci. pp. – (2024).
- (Curran Associates, Inc.), Vol. 36, pp. 60585–60598 (2023).
- \JournalTitlePhys. Rev. B 102, 041121 (2020).
- \JournalTitleThe Journal of Chemical Physics 158 (2023).
- \JournalTitlenpj Computational Materials 5, 125 (2019).
- \JournalTitleJournal of Chemical Theory and Computation 19, 7908–7923 (2023).
- \JournalTitleJournal of Chemical Theory and Computation 19, 6151–6159 (2023).
- \JournalTitleAnnual Review of Biophysics 45, 253–278 (2016).
- \JournalTitleNature 596, 583–589 (2021).
- \JournalTitleScience 373, 871–876 (2021).
- \JournalTitleNature 630, 493–500 (2024).
- \JournalTitleNature 620, 1089–1100 (2023).
- \JournalTitleNature Structural & Molecular Biology 29, 1–2 (2022).
- \JournalTitleeLife 11, e75751 (2022).
- \JournalTitleNature 625, 832–839 (2024).
- \JournalTitlePLOS Computational Biology 18, 1–16 (2022).
- \JournalTitleNature 450, 964–972 (2007).
- GR Bowman, Alphafold and protein folding: Not dead yet! the frontier is conformational ensembles. \JournalTitleAnnual Review of Biomedical Data Science (2024).
- \JournalTitleJournal of Chemical Theory and Computation 19, 4351–4354 (2023) PMID: 37171364.
- \JournalTitleJournal of Chemical Theory and Computation 19, 4355–4363 (2023) PMID: 36948209.
- \JournalTitleJournal of Chemical Information and Modeling 64, 2789–2797 (2024) PMID: 37981824.
- \JournalTitleeLife Sciences Publications, Ltd (2024).
- \JournalTitleScience 365, eaaw1147 (2019).
- (2023).
- \JournalTitleNucleic Acids Research 52, D384–D392 (2023).
- \JournalTitleNature Machine Intelligence 6, 558?567 (2024).
- \JournalTitleNature Machine Intelligence 6, 195–208 (2024).
- \JournalTitleChemical Society Reviews 50, 2224–2243 (2021).
- \JournalTitleJournal of Molecular Biology p. 168552 (2024).
- \JournalTitleProceedings of the National Academy of Sciences 77, 6309–6313 (1980).
- M Zuker, On finding all suboptimal foldings of an rna molecule. \JournalTitleScience 244, 48–52 (1989).
- M Zuker, Mfold web server for nucleic acid folding and hybridization prediction. \JournalTitleNucleic acids research 31, 3406–3415 (2003).
- \JournalTitlePloS one 9, e107504 (2014).
- \JournalTitleNature 452, 51–55 (2008).
- \JournalTitleProceedings of the National Academy of Sciences 108, 20573–20578 (2011).
- \JournalTitleStructure 28, 963–976 (2020).
- \JournalTitleBioRxiv pp. 2022–05 (2022).
- \JournalTitleNature Communications 14, 7266 (2023).
- \JournalTitleNature Communications 14, 5745 (2023).
- \JournalTitleNucleic acids research 31, 439–441 (2003).
- Rnacentral: a hub of information for non-coding rna sequences. \JournalTitleNucleic Acids Research 47, D221–D229 (2019).
- \JournalTitleBMC structural biology 19, 1–11 (2019).
- \JournalTitleNature communications 12, 2777 (2021).
- \JournalTitleNucleic acids research 43, e63–e63 (2015).
- \JournalTitleScience 373, 1047–1051 (2021).
- \JournalTitleNucleic acids research 51, 3341–3356 (2023).
- \JournalTitleProteins: Structure, Function, and Bioinformatics 91, 1771–1778 (2023).
- \JournalTitleNAR genomics and bioinformatics 4, lqac012 (2022).
- \JournalTitlearXiv preprint arXiv:2204.00300 (2022).
- \JournalTitlebioRxiv pp. 2023–07 (2023).
- \JournalTitlebioRxiv pp. 2023–01 (2023).
- \JournalTitlebioRxiv pp. 2023–12 (2023).
- \JournalTitleNature Machine Intelligence 6, 449–460 (2024).
- \JournalTitlearXiv preprint arXiv:2207.01586 (2022).
- \JournalTitlebioRxiv pp. 2022–09 (2022).
- \JournalTitlebioRxiv (2023).
- \JournalTitleScience 384, eadl2528 (2024).
- \JournalTitleProteins: Structure, Function, and Bioinformatics 91, 1747–1770 (2023).
- \JournalTitleJournal of molecular biology 434, 167802 (2022).
- \JournalTitleNature 617, 835–841 (2023).
- \JournalTitleCurrent Opinion in Structural Biology 88, 102908 (2024).
- \JournalTitleNature reviews microbiology 10, 255–265 (2012).
- \JournalTitleAnnual review of biophysics and biomolecular structure 26, 113–137 (1997).
- \JournalTitleJournal of the American Chemical Society 142, 907–921 (2019).
- \JournalTitlearXiv preprint arXiv:2308.14885 (2023).