Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The power of motifs as inductive bias for learning molecular distributions (2306.17246v1)

Published 4 Apr 2023 in cs.LG, q-bio.BM, and stat.AP

Abstract: Machine learning for molecules holds great potential for efficiently exploring the vast chemical space and thus streamlining the drug discovery process by facilitating the design of new therapeutic molecules. Deep generative models have shown promising results for molecule generation, but the benefits of specific inductive biases for learning distributions over small graphs are unclear. Our study aims to investigate the impact of subgraph structures and vocabulary design on distribution learning, using small drug molecules as a case study. To this end, we introduce Subcover, a new subgraph-based fragmentation scheme, and evaluate it through a two-step variational auto-encoder. Our results show that Subcover's improved identification of chemically meaningful subgraphs leads to a relative improvement of the FCD score by 30%, outperforming previous methods. Our findings highlight the potential of Subcover to enhance the performance and scalability of existing methods, contributing to the advancement of drug discovery.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Spanning tree-based graph generation for molecules. In International Conference on Learning Representations, 2021.
  2. Defactor: Differentiable edge factorization-based probabilistic graph generation. arXiv preprint arXiv:1811.09766, 2018.
  3. Flow network based generative models for non-iterative diverse candidate generation. Advances in Neural Information Processing Systems, 2021.
  4. Mostapha Benhenda. ChemGAN challenge for drug discovery: can AI reproduce natural chemical diversity? arXiv preprint arXiv:1708.08227, 2017.
  5. Generative chemistry: drug discovery with deep learning generative models. Journal of Molecular Modeling, 2021. doi: 10.1007/s00894-021-04674-8.
  6. Quantifying the chemical beauty of drugs. Nature Chemistry, 2012. doi: 10.1038/nchem.1243.
  7. A two-step graph convolutional decoder for molecule generation. arXiv preprint arXiv:1906.03412, 2019.
  8. GuacaMol: Benchmarking Models for De Novo Molecular Design. Journal of Chemical Information and Modeling, 2019. doi: 10.1021/acs.jcim.8b00839.
  9. Recent advances and applications of deep learning methods in materials science. npj Computational Materials, 2022. doi: 10.1038/s41524-022-00734-6.
  10. Molgan: An implicit generative model for small molecular graphs. arXiv preprint arXiv:1805.11973, 2018.
  11. Molgensurvey: A systematic survey in machine learning models for molecule design. arXiv preprint arXiv:2203.14500, 2022.
  12. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. Journal of cheminformatics, 2009.
  13. Graph deconvolutional generation. arXiv preprint arXiv:2002.07087, 2020.
  14. Sample efficiency matters: a benchmark for practical molecular optimization. arXiv preprint arXiv:2206.12411, 2022.
  15. Automatic chemical design using a data-driven continuous representation of molecules. ACS central science, 2018. Publisher: ACS Publications.
  16. Predicting cellular responses to novel drug perturbations at a single-cell resolution. In Advances in Neural Information Processing Systems, 2022.
  17. Junction tree variational autoencoder for molecular graph generation. In International conference on machine learning. PMLR, 2018.
  18. Hierarchical Generation of Molecular Graphs using Structural Motifs. Technical report, International Conference on Machine Learning, 2020.
  19. Hiroshi Kajino. Molecular hypergraph grammar with its application to molecular optimization. In International Conference on Machine Learning. PMLR, 2019.
  20. DeepGraphMolGen, a multi-objective, computational strategy for generating molecules with desirable properties: a graph convolution and reinforcement learning approach. Journal of cheminformatics, 2020. Publisher: BioMed Central.
  21. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  22. Molecule Generation by Principal Subgraph Mining and Assembling, 2022. arXiv:2106.15098 [cs, q-bio].
  23. Grammar Variational Autoencoder. Technical report, arXiv, 2017. arXiv:1703.01925 [stat] type: article.
  24. G. Landrum. RDKit, 2010.
  25. Learning Deep Generative Models of Graphs. Technical report, arXiv, 2018. arXiv:1803.03324 [cs, stat] type: article.
  26. Scaffold based molecular design using graph generative model. Chemical Science, 2020. doi: 10.1039/C9SC04503A. arXiv:1905.13639 [cs, q-bio, stat].
  27. GraphEBM: Molecular Graph Generation with Energy-Based Models. Technical report, arXiv, 2021.
  28. Constrained Graph Variational Autoencoders for Molecule Design. Technical report, Advances in Neural Information Processing Systems, 2018.
  29. GraphDF: A Discrete Flow Model for Molecular Graph Generation. Technical report, International Conference on Machine Learning, 2021.
  30. Constrained Generation of Semantically Valid Graphs via Regularizing Variational Autoencoders. Technical report, Advances in Neural Information Processing Systems, 2018.
  31. Learning to Extend Molecular Scaffolds with Structural Motifs. Technical report, arXiv, 2022. arXiv:2103.03864 [cs, q-bio] type: article.
  32. Graph Networks for Molecular Design. Machine Learning: Science and Technology, 2021. Publisher: IOP Publishing.
  33. Augmenting genetic algorithms with deep neural networks for exploring the chemical space. arXiv preprint arXiv:1909.11655, 2019.
  34. Estimation of the size of drug-like chemical space based on GDB-17 data. Journal of Computer-Aided Molecular Design, 2013. doi: 10.1007/s10822-013-9672-4.
  35. Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models, 2020. arXiv:1811.12823 [cs, stat].
  36. MolecularRNN: Generating realistic molecular graphs with optimized properties. Technical report, arXiv, 2019. arXiv:1905.13372 [cs, q-bio, stat] type: article.
  37. Fréchet ChemNet Distance: A Metric for Generative Models for Molecules in Drug Discovery. Journal of Chemical Information and Modeling, 2018. doi: 10.1021/acs.jcim.8b00234. Publisher: American Chemical Society.
  38. The enumeration of chemical space. WIREs Computational Molecular Science, 2012. doi: 10.1002/wcms.1104. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/wcms.1104.
  39. NeVAE: A Deep Generative Model for Molecular Graphs, 2019. arXiv:1802.05283 [physics, stat].
  40. GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation. Technical report, arXiv, 2020. arXiv:2001.09382 [cs, stat] type: article.
  41. Masked Label Prediction: Unified Message Passing Model for Semi-Supervised Classification, 2021. arXiv:2009.03509 [cs, stat].
  42. GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders, 2018. arXiv:1802.03480 [cs].
  43. Molecule Generation for Drug Design: a Graph Learning Perspective. Technical report, arXiv, 2022. arXiv:2202.09212 [cs] type: article.
  44. Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Generation. Advances in Neural Information Processing Systems, 2021.
  45. Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation. Technical report, arXiv, 2019. arXiv:1806.02473 [cs, stat] type: article.
  46. Molecular Representation Learning via Heterogeneous Motif Graph Neural Networks. In Proceedings of the 39th International Conference on Machine Learning. PMLR, 2022. ISSN: 2640-3498.
  47. MoFlow: An Invertible Flow Model for Generating Molecular Graphs. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020. doi: 10.1145/3394486.3403104. arXiv:2006.10137 [physics, stat].
  48. A Survey on Deep Graph Generation: Methods and Applications, 2022. arXiv:2203.06714 [cs, q-bio].
Citations (4)

Summary

We haven't generated a summary for this paper yet.