Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MultiModal-Learning for Predicting Molecular Properties: A Framework Based on Image and Graph Structures (2311.16666v2)

Published 28 Nov 2023 in cs.LG, cs.AI, physics.chem-ph, and q-bio.BM

Abstract: The quest for accurate prediction of drug molecule properties poses a fundamental challenge in the realm of Artificial Intelligence Drug Discovery (AIDD). An effective representation of drug molecules emerges as a pivotal component in this pursuit. Contemporary leading-edge research predominantly resorts to self-supervised learning (SSL) techniques to extract meaningful structural representations from large-scale, unlabeled molecular data, subsequently fine-tuning these representations for an array of downstream tasks. However, an inherent shortcoming of these studies lies in their singular reliance on one modality of molecular information, such as molecule image or SMILES representations, thus neglecting the potential complementarity of various molecular modalities. In response to this limitation, we propose MolIG, a novel MultiModaL molecular pre-training framework for predicting molecular properties based on Image and Graph structures. MolIG model innovatively leverages the coherence and correlation between molecule graph and molecule image to execute self-supervised tasks, effectively amalgamating the strengths of both molecular representation forms. This holistic approach allows for the capture of pivotal molecular structural characteristics and high-level semantic information. Upon completion of pre-training, Graph Neural Network (GNN) Encoder is used for the prediction of downstream tasks. In comparison to advanced baseline models, MolIG exhibits enhanced performance in downstream tasks pertaining to molecular property prediction within benchmark groups such as MoleculeNet Benchmark Group and ADMET Benchmark Group.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Deep learning for computational chemistry. Journal of computational chemistry, 38(16):1291–1307, 2017.
  2. Moleculenet: a benchmark for molecular machine learning. Chemical science, 9(2):513–530, 2018.
  3. The rise of deep learning in drug discovery. Drug discovery today, 23(6):1241–1250, 2018.
  4. Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. Journal of medicinal chemistry, 63(16):8749–8760, 2019.
  5. Extended-connectivity fingerprints. Journal of chemical information and modeling, 50(5):742–754, 2010.
  6. Density functional theory. Photosynthesis research, 102:443–453, 2009.
  7. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems, 32(1):4–24, 2020.
  8. Graph-based molecular representation learning. arXiv preprint arXiv:2207.04869, 2022.
  9. Neural message passing for quantum chemistry. In International conference on machine learning, pages 1263–1272. PMLR, 2017.
  10. Graph contrastive learning with augmentations. Advances in neural information processing systems, 33:5812–5823, 2020.
  11. Molecular image-based convolutional neural network for the prediction of admet properties. Chemometrics and Intelligent Laboratory Systems, 194:103853, 2019.
  12. Molecular image-convolutional neural network (cnn) assisted qsar models for predicting contaminant reactivity toward oh radicals: Transfer learning, data augmentation and model interpretation. Chemical Engineering Journal, 408:127998, 2021.
  13. Chemception: a deep neural network with minimal chemistry knowledge matches the performance of expert-developed qsar/qspr models. arXiv preprint arXiv:1706.06689, 2017.
  14. Molecular contrastive learning of representations via graph neural networks. Nature Machine Intelligence, 4(3):279–287, 2022.
  15. Self-supervised graph transformer on large-scale molecular data. Advances in Neural Information Processing Systems, 33:12559–12571, 2020.
  16. Knowledge graph-enhanced molecular contrastive learning with functional prompt. Nature Machine Intelligence, pages 1–12, 2023.
  17. Geomgcl: Geometric graph contrastive learning for molecular property prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 4541–4549, 2022.
  18. Multi-modal molecule structure-text model for text-based retrieval and editing. arXiv preprint arXiv:2212.10789, 2022.
  19. How powerful are graph neural networks? In International Conference on Learning Representations, 2019.
  20. Strategies for pre-training graph neural networks. In International Conference on Learning Representations, 2020.
  21. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  22. Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6):84–90, 2017.
  23. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
  24. Greg Landrum. Rdkit: Open-source cheminformatics. 2006. Google Scholar, 2006.
  25. Pubchem 2019 update: improved access to chemical data. Nucleic acids research, 47(D1):D1102–D1109, 2019.
  26. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), 2017.
  27. Large-scale distributed training of transformers for chemical fingerprinting. Journal of Chemical Information and Modeling, 62(20):4852–4862, 2022.
  28. Bartsmiles: Generative masked language models for molecular representations. arXiv preprint arXiv:2211.16349, 2022.
  29. Mole-bert: Rethinking pre-training graph neural networks for molecules. In The Eleventh International Conference on Learning Representations, 2022.
  30. Rethinking tokenizer and decoder in masked graph modeling for molecules. Advances in Neural Information Processing Systems, 36, 2024.
  31. Fg-bert: a generalized and self-supervised functional group-based molecular representation learning framework for properties prediction. Briefings in Bioinformatics, 24(6):bbad398, 2023.
  32. 3d infomax improves gnns for molecular property prediction. In International Conference on Machine Learning, pages 20479–20502. PMLR, 2022.
  33. Pre-training molecular graph representation with 3d geometry. In International Conference on Learning Representations, 2022.
  34. Motif-based graph self-supervised learning for molecular property prediction. Advances in Neural Information Processing Systems, 34:15870–15882, 2021.
  35. Graphmae: Self-supervised masked graph autoencoders. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 594–604, 2022.
  36. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16000–16009, 2022.
  37. Therapeutics data commons: Machine learning datasets and tasks for drug discovery and development. arXiv preprint arXiv:2102.09548, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zhuoyuan Wang (16 papers)
  2. Jiacong Mi (3 papers)
  3. Shan Lu (31 papers)
  4. Jieyue He (6 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com