Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Graph-based Molecular Representation Learning (2207.04869v3)

Published 8 Jul 2022 in q-bio.QM and cs.LG

Abstract: Molecular representation learning (MRL) is a key step to build the connection between machine learning and chemical science. In particular, it encodes molecules as numerical vectors preserving the molecular structures and features, on top of which the downstream tasks (e.g., property prediction) can be performed. Recently, MRL has achieved considerable progress, especially in methods based on deep molecular graph learning. In this survey, we systematically review these graph-based molecular representation techniques, especially the methods incorporating chemical domain knowledge. Specifically, we first introduce the features of 2D and 3D molecular graphs. Then we summarize and categorize MRL methods into three groups based on their input. Furthermore, we discuss some typical chemical applications supported by MRL. To facilitate studies in this fast-developing area, we also list the benchmarks and commonly used datasets in the paper. Finally, we share our thoughts on future research directions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. Predicting reaction performance in c–n cross-coupling using machine learning. Science, 2018.
  2. Marc Brockschmidt. Gnn-film: Graph neural networks with feature-wise linear modulation. In ICML, 2020.
  3. Prediction of organic reaction outcomes using machine learning. ACS central science, 2017.
  4. A graph-convolutional neural network model for the prediction of chemical reactivity. Chemical science, 2019.
  5. Graph transformation policy network for chemical reaction prediction. In KDD, 2019.
  6. Molecular contrastive learning with chemical element knowledge graph. In AAAI, 2022.
  7. Se (3)-transformers: 3d roto-translation equivariant attention networks. NeurIPS, 2020.
  8. Interpretable drug target prediction using deep neural representation. In IJCAI, 2018.
  9. The chembl database in 2017. Nucleic acids research, 2017.
  10. Neural message passing for quantum chemistry. In ICML, 2017.
  11. Graseq: graph and sequence fusion learning for molecular property prediction. In CIKM, 2020.
  12. Few-shot graph learning for molecular property prediction. In WWW, 2021.
  13. Boosting graph neural networks via adaptive knowledge distillation. In AAAI, 2023.
  14. Inductive representation learning on large graphs. NeurIPS, 2017.
  15. Improving molecular graph neural network explainability with orthonormalization and induced sparsity. In ICML, 2021.
  16. Strategies for pre-training graph neural networks. In ICLR, 2020.
  17. Predicting organic reaction outcomes with weisfeiler-lehman network. NeurIPS, 2017.
  18. Junction tree variational autoencoder for molecular graph generation. In ICML, 2018.
  19. Learning multimodal graph-to-graph translation for molecule optimization. In ICLR, 2018.
  20. Self-supervised learning on graphs: Deep insights and new direction. arXiv preprint arXiv:2006.10141, 2020.
  21. Hierarchical generation of molecular graphs using structural motifs. In ICML, 2020.
  22. Multi-objective molecule generation using interpretable substructures. In ICML, 2020.
  23. Pubchem 2019 update: improved access to chemical data. Nucleic acids research, 2019.
  24. Semi-supervised classification with graph convolutional networks. In ICLR, 2017.
  25. Directional message passing for molecular graphs. In ICLR, 2019.
  26. Gemnet: Universal directional graph neural networks for molecules. NeurIPS, 2021.
  27. Directional message passing on molecular graphs via synthetic coordinates. NeurIPS, 2021.
  28. Self-referencing embedded strings (selfies): A 100% robust molecular string representation. Machine Learning: Science and Technology, 2020.
  29. Grammar variational autoencoder. In ICML, 2017.
  30. G. A. Landrum. Rdkit: Open-source cheminformatics software. http://www.rdkit.org, 2020. Accessed: 2023-01-01.
  31. Gated graph sequence neural networks. In ICLR, 2016.
  32. Geomgcl: Geometric graph contrastive learning for molecular property prediction. In AAAI, 2022.
  33. Kgnn: Knowledge graph neural network for drug-drug interaction prediction. In IJCAI, 2020.
  34. Graph rationalization with environment-based augmentations. In KDD, 2022.
  35. Pre-training molecular graph representation with 3d geometry. In ICLR, 2022.
  36. Spherical message passing for 3d molecular graphs. In ICLR, 2022.
  37. Daniel Mark Lowe. Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge, 2012.
  38. Mdnn: A multimodal deep neural network for predicting drug-drug interaction events. In IJCAI, 2021.
  39. Drug similarity integration through attentive multi-view graph auto-encoders. In IJCAI, 2018.
  40. Learning to extend molecular scaffolds with structural motifs. In ICLR, 2022.
  41. Self-supervised graph transformer on large-scale molecular data. NeurIPS, 2020.
  42. On the use of real-world datasets for reaction yield prediction. Chemical Science, 2023.
  43. Graphaf: a flow-based autoregressive model for molecular graph generation. In ICLR, 2020.
  44. Heterogeneous molecular graph neural networks for predicting molecule properties. In ICDM, 2020.
  45. 3d infomax improves gnns for molecular property prediction. In ICML, 2022.
  46. Zinc 15–ligand discovery for everyone. Journal of chemical information and modeling, 2015.
  47. Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization. In ICLR, 2020.
  48. Mocl: Contrastive learning on molecular graphs with multi-level domain knowledge. arXiv preprint arXiv:2106.04509, 2021.
  49. A self-attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility. Journal of cheminformatics, 2020.
  50. Graph attention networks. In ICLR, 2018.
  51. Property-aware relation networks for few-shot molecular property prediction. NeurIPS, 2021.
  52. Chemical-reaction-aware molecule representation learning. In ICLR, 2022.
  53. Comenet: Towards complete and efficient message passing for 3d molecular graphs. In NeurIPS, 2022.
  54. Smiles. 2. algorithm for generation of unique smiles notation. Journal of chemical information and computer sciences, 1989.
  55. Moleculenet: a benchmark for molecular machine learning. Chemical science, 2018.
  56. How powerful are graph neural networks? In ICLR, 2019.
  57. An end-to-end framework for molecular conformation generation via bilevel programming. In ICML, 2021.
  58. Safedrug: Dual molecular graph encoders for recommending effective and safe drug combinations. In IJCAI, 2021.
  59. Deep molecular representation learning via fusing physical and chemical information. NeurIPS, 2021.
  60. Learning substructure invariance for out-of-distribution molecular representations. In NeurIPS, 2022.
  61. Molerec: Combinatorial drug recommendation with substructure-aware molecular representation learning. In WWW, 2023.
  62. Graph contrastive learning with augmentations. NeurIPS, 2020.
  63. Heterogeneous graph neural network. In KDD, 2019.
  64. Motif-based graph self-supervised learning for molecular property prediction. NeurIPS, 2021.
  65. Unified 2d and 3d pre-training of molecular representations. In KDD, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Zhichun Guo (28 papers)
  2. Kehan Guo (16 papers)
  3. Bozhao Nan (6 papers)
  4. Yijun Tian (29 papers)
  5. Roshni G. Iyer (7 papers)
  6. Yihong Ma (9 papers)
  7. Olaf Wiest (9 papers)
  8. Xiangliang Zhang (131 papers)
  9. Wei Wang (1793 papers)
  10. Chuxu Zhang (51 papers)
  11. Nitesh V. Chawla (111 papers)
Citations (49)