MolFM: A Multimodal Molecular Foundation Model (2307.09484v2)

Published 6 Jun 2023 in q-bio.BM, cs.CE, cs.LG, and physics.chem-ph

Abstract: Molecular knowledge resides within three different modalities of information sources: molecular structures, biomedical documents, and knowledge bases. Effective incorporation of molecular knowledge from these modalities holds paramount significance in facilitating biomedical research. However, existing multimodal molecular foundation models exhibit limitations in capturing intricate connections between molecular structures and texts, and more importantly, none of them attempt to leverage a wealth of molecular expertise derived from knowledge graphs. In this study, we introduce MolFM, a multimodal molecular foundation model designed to facilitate joint representation learning from molecular structures, biomedical texts, and knowledge graphs. We propose cross-modal attention between atoms of molecular structures, neighbors of molecule entities and semantically related texts to facilitate cross-modal comprehension. We provide theoretical analysis that our cross-modal pre-training captures local and global molecular knowledge by minimizing the distance in the feature space between different modalities of the same molecule, as well as molecules sharing similar structures or functions. MolFM achieves state-of-the-art performance on various downstream tasks. On cross-modal retrieval, MolFM outperforms existing models with 12.13% and 5.04% absolute gains under the zero-shot and fine-tuning settings, respectively. Furthermore, qualitative analysis showcases MolFM's implicit ability to provide grounding from molecular substructures and knowledge graphs. Code and models are available on https://github.com/BioFM/OpenBioMed.

PDF Abstract

MolFM: A Multimodal Molecular Foundation Model

The paper introduces MolFM, a multimodal molecular foundation model designed to integrate and learn from molecular structures, biomedical texts, and knowledge graphs. The primary objective is to address limitations in existing models that inadequately capture the complex relationships between these modalities and fail to fully utilize knowledge graphs.

Methodology and Model Architecture

MolFM employs a sophisticated architecture composed of:

Molecular Graph Encoder: Utilizing a GIN network to extract structural information.
Text Encoder: Using a modified transformer initialized from KV-PLM for text representations.
Knowledge Graph Encoder: Leveraging TransE to encode information from knowledge graphs.

A key innovation of MolFM is the multimodal encoder, which uses cross-modal attention to synthesize features from these diverse modalities, facilitating a more comprehensive understanding of molecular data.

Pre-training Objectives

MolFM's pre-training involves several objectives aimed at enhancing multimodal representation learning:

Structure-Text Contrastive Loss (STC): Aligns molecular structures with textual descriptions using contrastive learning.
Cross-Modal Matching (CMM): Ensures the model accurately predicts matching pairs across modalities.
Masked LLMing (MLM): Improves text understanding by guessing masked tokens.
Knowledge Graph Embedding (KGE): Aligns structurally and functionally similar molecules using a max-margin loss.

Through theoretical analyses, the paper interprets CMM and KGE as deep metric learning tasks that reduce modality gaps and capture relevant molecular knowledge.

Results and Findings

MolFM demonstrates significant improvements in various tasks:

Cross-Modal Retrieval: Achieves substantial performance gains over previous models, with notable improvements of approximately 12.13% and 5.04% in zero-shot and fine-tune settings, respectively.
Molecule Captioning: Excels in generating accurate descriptions, outperforming baselines in BLEU and Text2Mol scores.
Text-Based Molecule Generation: Produces more precise molecular representations based on textual input.
Molecular Property Prediction: Leverages multimodal data to enhance predictive accuracy, showing an average absolute gain of 1.55% across datasets.

Implications and Future Directions

The development and success of MolFM underscore the potential of integrating multiple data modalities in molecular modeling. The work highlights how incorporating knowledge graphs can provide a global contextual understanding, enhancing both generative and predictive tasks.

Looking forward, the implications for AI in drug discovery and biomedical research are significant. MolFM's ability to connect molecular structures with comprehensive text and knowledge-based contexts could lead to more sophisticated AI systems capable of biological reasoning and hypothesis generation.

One of the notable challenges remains in scaling and refining the quality of pre-training datasets to mitigate potential biases and inaccuracies. Furthermore, expanding the scope of the model to include other biological entities like proteins and genes could enrich its application in the broader biomedical landscape.

MolFM sets a benchmark for future multimodal approaches, offering a framework that could be adapted and built upon for more nuanced and effective AI-driven insights in the biomedical field.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Yizhen Luo (10 papers)
Kai Yang (187 papers)
Massimo Hong (5 papers)
Xing Yi Liu (5 papers)
Zaiqing Nie (27 papers)

Citations (31)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - PharMolix/OpenBioMed (890 stars)