Multi-Modal Representation Learning for Molecular Property Prediction: Sequence, Graph, Geometry (2401.03369v2)
Abstract: Molecular property prediction refers to the task of labeling molecules with some biochemical properties, playing a pivotal role in the drug discovery and design process. Recently, with the advancement of machine learning, deep learning-based molecular property prediction has emerged as a solution to the resource-intensive nature of traditional methods, garnering significant attention. Among them, molecular representation learning is the key factor for molecular property prediction performance. And there are lots of sequence-based, graph-based, and geometry-based methods that have been proposed. However, the majority of existing studies focus solely on one modality for learning molecular representations, failing to comprehensively capture molecular characteristics and information. In this paper, a novel multi-modal representation learning model, which integrates the sequence, graph, and geometry characteristics, is proposed for molecular property prediction, called SGGRL. Specifically, we design a fusion layer to fusion the representation of different modalities. Furthermore, to ensure consistency across modalities, SGGRL is trained to maximize the similarity of representations for the same molecule while minimizing similarity for different molecules. To verify the effectiveness of SGGRL, seven molecular datasets, and several baselines are used for evaluation and comparison. The experimental results demonstrate that SGGRL consistently outperforms the baselines in most cases. This further underscores the capability of SGGRL to comprehensively capture molecular information. Overall, the proposed SGGRL model showcases its potential to revolutionize molecular property prediction by leveraging multi-modal representation learning to extract diverse and comprehensive molecular insights. Our code is released at https://github.com/Vencent-Won/SGGRL.
- Rethinking drug design in the artificial intelligence era. Nature Reviews Drug Discovery, 19(5):353–364, 2020.
- Deep learning methods for molecular representation and property prediction. Drug Discovery Today, page 103373, 2022.
- Applications of deep learning in molecule generation and molecular property prediction. Accounts of chemical research, 54(2):263–270, 2020.
- Geometry-enhanced molecular representation learning for property prediction. Nature Machine Intelligence, 4(2):127–134, 2022.
- Seq3seq fingerprint: towards end-to-end semi-supervised deep drug discovery. In Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics, pages 404–413, 2018.
- A compact review of molecular property prediction with graph neural networks. Drug Discovery Today: Technologies, 37:1–12, 2020.
- David Weininger. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences, 28(1):31–36, 1988.
- Smiles-bert: large scale unsupervised pre-training for molecular property prediction. In Proceedings of the 10th ACM international conference on bioinformatics, computational biology and health informatics, pages 429–436, 2019.
- Seq2seq fingerprint: An unsupervised deep molecular embedding for drug discovery. In Proceedings of the 8th ACM international conference on bioinformatics, computational biology, and health informatics, pages 285–294, 2017.
- Smiles transformer: Pre-trained molecular fingerprint for low data drug discovery. arXiv preprint arXiv:1911.04738, 2019.
- Communicative representation learning on attributed molecular graphs. In Christian Bessiere, editor, Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, pages 2831–2838. International Joint Conferences on Artificial Intelligence Organization, 7 2020. Main track.
- Neural message passing for quantum chemistry. In International conference on machine learning, pages 1263–1272. PMLR, 2017.
- Gemnet: Universal directional graph neural networks for molecules. Advances in Neural Information Processing Systems, 34:6790–6802, 2021.
- Graphaf: a flow-based autoregressive model for molecular graph generation. In International Conference on Learning Representations, 2020.
- Asgn: An active semi-supervised graph neural network for molecular property prediction. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 731–752, 2020.
- 3d infomax improves gnns for molecular property prediction. In International Conference on Machine Learning, pages 20479–20502. PMLR, 2022.
- Automated 3d pre-training for molecular property prediction. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2419–2430, 2023.
- Graseq: graph and sequence fusion learning for molecular property prediction. In Proceedings of the 29th ACM international conference on information & knowledge management, pages 435–443, 2020.
- Geomgcl: Geometric graph contrastive learning for molecular property prediction. In Proceedings of the AAAI conference on artificial intelligence, volume 36, pages 4541–4549, 2022.
- Moleculenet: a benchmark for molecular machine learning. Chemical science, 9(2):513–530, 2018.
- Exploring chemical space using natural language processing methodologies for drug discovery. Drug Discovery Today, 25(4):689–705, 2020.
- Identifying structure–property relationships through smiles syntax analysis with self-attention mechanism. Journal of chemical information and modeling, 59(2):914–923, 2019.
- Molecular fingerprint similarity search in virtual screening. Methods, 71:58–63, 2015.
- Fast and accurate modeling of molecular atomization energies with machine learning. Physical review letters, 108(5):058301, 2012.
- Analyzing learned molecular representations for property prediction. Journal of chemical information and modeling, 59(8):3370–3388, 2019.
- Geom, energy-annotated molecular conformations for property prediction and molecular generation. Scientific Data, 9(1):185, 2022.
- How powerful are graph neural networks? arXiv preprint arXiv:1810.00826, 2018.
- Communicative representation learning on attributed molecular graphs. In IJCAI, volume 2020, pages 2831–2838, 2020.
- Uni-mol: A universal 3d molecular representation learning framework. In The Eleventh International Conference on Learning Representations, 2023.
- Molecular graph enhanced transformer for retrosynthesis prediction. Neurocomputing, 457:193–202, 2021.
- Pre-training molecular graph representation with 3d geometry. In International Conference on Learning Representations, 2022.
- Pre-training molecular graph representation with 3d geometry. arXiv preprint arXiv:2110.07728, 2021.
- Strategies for pre-training graph neural networks. In International Conference on Learning Representations (ICLR), 2020.
- Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493, 2015.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
- Zeyu Wang (137 papers)
- Tianyi Jiang (5 papers)
- Jinhuan Wang (27 papers)
- Qi Xuan (113 papers)