Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Representing Molecules as Random Walks Over Interpretable Grammars (2403.08147v3)

Published 13 Mar 2024 in cs.LG and q-bio.BM

Abstract: Recent research in molecular discovery has primarily been devoted to small, drug-like molecules, leaving many similarly important applications in material design without adequate technology. These applications often rely on more complex molecular structures with fewer examples that are carefully designed using known substructures. We propose a data-efficient and interpretable model for representing and reasoning over such molecules in terms of graph grammars that explicitly describe the hierarchical design space featuring motifs to be the design basis. We present a novel representation in the form of random walks over the design space, which facilitates both molecule generation and property prediction. We demonstrate clear advantages over existing methods in terms of performance, efficiency, and synthesizability of predicted molecules, and we provide detailed insights into the method's chemical interpretability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. A graph representation of molecular ensembles for polymer property prediction. Chemical Science, 13(35):10486–10498, 2022.
  2. Equivariant subgraph aggregation networks. ICLR, 2022.
  3. New speciality surfactants with natural structural motifs. New J. Chem., 30:1705–1717, 2006. doi: 10.1039/B610045G.
  4. The role of chemical design in the performance of organic semiconductors. Nature Reviews Chemistry, 4(2):66–77, jan 2020. ISSN 2397-3358. doi: 10.1038/s41570-019-0152-9.
  5. Graph coarsening with neural networks. ICLR, 2021.
  6. ChemAxon. Fragmenter. URL http://www.chemaxon.com/.
  7. Retro*: Learning retrosynthetic planning with neural guided a* search. 2020.
  8. Graph coarsening: from scientific computing to machine learning. SeMA Journal, 79(1):187–223, 2022.
  9. A gromov–wasserstein geometric view of spectrum-preserving graph coarsening. ICML, 2023.
  10. On the art of compiling and using ’drug-like’ chemical fragment spaces. ChemMedChem, 2008.
  11. Understanding and extending subgraph gnns by rethinking their symmetries. NeurIPS, 2022.
  12. Polygrammar: Grammar for digital polymer representation and generation. Advanced Science, 9(23):2101864, 2022. doi: https://doi.org/10.1002/advs.202101864. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/advs.202101864.
  13. Grammar-induced geometry for data-efficient molecular property prediction. 2023a.
  14. Hierarchical grammar-induced gemoetry for data-efficient molecular property prediction. ICML, 2023b.
  15. The Predictive Toxicology Challenge 2000–2001 . Bioinformatics, 17(1):107–108, 2001.
  16. Strategies for pre-training graph neural networks. ICLR, 2020.
  17. Modeling epoxidation of drug-like molecules with a deep machine learning network. ACS Central Science, 1(4):168–180, 2015.
  18. IUPAC. Compendium of Chemical Terminology. 1997.
  19. Multigran-smiles: multi-granularity smiles learning for molecular property prediction. Bioinformatics, 38(19):4573–4580, 2022.
  20. Hierarchical generation of molecular graphs using structural motifs. ICML, 2020.
  21. Kajino, H. Molecular hypergraph grammar with its application to molecular optimization. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp.  3183–3191. PMLR, 09–15 Jun 2019. URL https://proceedings.mlr.press/v97/kajino19a.html.
  22. A strategic approach to machine learning for material science: How to tackle real-world challenges and avoid pitfalls. Chemistry of Materials, 2022.
  23. Self-referencing embedded strings (selfies): A 100 Machine Learning: Science and Technology, 1(4):045024, oct 2020. doi: 10.1088/2632-2153/aba947. URL https://dx.doi.org/10.1088/2632-2153/aba947.
  24. Landrum, G. Rdkit: Open-source cheminformatics software. 2016.
  25. Design and synthesis of novel oxime ester photoinitiators augmented by automated machine learning. Chemistry of Materials, 34(1):116–127, jan 2022. ISSN 0897-4756. doi: 10.1021/acs.chemmater.1c02871.
  26. Surfactant-like peptides: From molecular design to controllable self-assembly with applications. Coordination Chemistry Reviews, 421:213418, 2020. ISSN 0010-8545. doi: https://doi.org/10.1016/j.ccr.2020.213418.
  27. The harvard organic photovoltaic dataset. Sci Data, 3, 2016.
  28. Ma and Chen. Unsupervised learning of graph hierarchical abstractions with differentiable coarsening and optimal transport. AAAI, 2021.
  29. The Carcinogenic Activities of Certain Halogen Derivatives of 4-Dimethylaminoazobenzene in the Rat*. Cancer Research, 9(11):652–660, 1949.
  30. Beyond generative models: superfast traversal, optimization, novelty, exploration and discovery (STONED) algorithm for molecules using SELFIES. Chem. Sci., 12(20):7079–7090, April 2021.
  31. Correlation and prediction of gas permeability in glassy polymer membrane materials via a modified free volume based group contribution method. Journal of Membrane Science, 125(1):23–39, 1997.
  32. Molecular sets (moses): A benchmarking platform for molecular generation models. Frontiers in Pharmacology, 2020.
  33. Extended-connectivity fingerprints. Journal of Chemical Information and Modeling, 2010.
  34. Sawlani, N. Drug discovery informatics market set to surge at 10.9 Transparency Market Research, Inc, 2024.
  35. Context-enriched molecule representations improve few-shot drug discovery. 2022. URL https://openreview.net/forum?id=kXXPLBEBVGH.
  36. Heterogeneous molecular graph neural networks for predicting molecule properties. ICDM, 2020.
  37. Fs-mol: A few-shot learning dataset of molecules. NeurIPS, 2021.
  38. Swager, T. M. 50th anniversary perspective: Conducting/semiconducting conjugated polymers. a personal perspective on the past and the future. Macromolecules, 50(13):4867–4886, 2017.
  39. Learning heat diffusion graphs. IEEE Transactions on Signal and Information Processing over Networks, 3(3):484–499, 2017.
  40. Epoxy thermosets designed for chemical recycling. Chemistry – An Asian Journal, 18(15), aug 2023. ISSN 1861-4728. doi: 10.1002/asia.202300373.
  41. The mechanical performance prediction of steel materials based on random forest. Frontiers in Computing and Intelligent Systems, 2023.
  42. Designing intrinsically microporous polymer (pim-1) microfibers with tunable morphology and porosity via controlling solvent/nonsolvent/polymer interactions. ACS Applied Polymer Materials, 2(6):2434–2443, 2020.
  43. Polymers of intrinsic microporosity for energy-intensive membrane-based gas separations. Materials Today Nano, 3:69–95, 2018.
  44. Molecular contrastive learning of representations via graph neural networks. nature machine intelligence, 2022.
  45. Revisiting group contribution theory for estimating fractional free volume of microporous polymer membranes. Journal of Membrane Science, 636, 2021.
  46. Understanding the limitations of deep models for molecular property prediction: Insights and solutions. NeurIPS, 2023a.
  47. A systematic survey of chemical pre-trained models. IJCAI, 2023b.
  48. How powerful are graph neural networks? ICLR, 2019.
  49. Learning substructure invariance for out-of-distribution molecular representations. NeurIPS, 2022.
  50. Dynamic polypyrrole core–shell chemomechanical actuators. Chemistry of Materials, 34(7):3013–3019, 2022.
  51. Uni-mol: A universal 3d molecular representation learning framework. ICLR, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (13)
  1. Michael Sun (21 papers)
  2. Minghao Guo (45 papers)
  3. Weize Yuan (2 papers)
  4. Veronika Thost (21 papers)
  5. Crystal Elaine Owens (3 papers)
  6. Aristotle Franklin Grosz (1 paper)
  7. Sharvaa Selvan (1 paper)
  8. Katelyn Zhou (2 papers)
  9. Hassan Mohiuddin (1 paper)
  10. Benjamin J Pedretti (1 paper)
  11. Zachary P Smith (1 paper)
  12. Jie Chen (602 papers)
  13. Wojciech Matusik (76 papers)
Citations (1)
Youtube Logo Streamline Icon: https://streamlinehq.com