Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Extracting Molecular Properties from Natural Language with Multimodal Contrastive Learning (2307.12996v1)

Published 22 Jul 2023 in cs.LG, cs.AI, cs.CL, cs.IR, and q-bio.QM

Abstract: Deep learning in computational biochemistry has traditionally focused on molecular graphs neural representations; however, recent advances in LLMs highlight how much scientific knowledge is encoded in text. To bridge these two modalities, we investigate how molecular property information can be transferred from natural language to graph representations. We study property prediction performance gains after using contrastive learning to align neural graph representations with representations of textual descriptions of their characteristics. We implement neural relevance scoring strategies to improve text retrieval, introduce a novel chemically-valid molecular graph augmentation strategy inspired by organic reactions, and demonstrate improved performance on downstream MoleculeNet property classification tasks. We achieve a +4.26% AUROC gain versus models pre-trained on the graph modality alone, and a +1.54% gain compared to recently proposed molecular graph/text contrastively trained MoMu model (Su et al. 2022).

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Romain Lacombe (6 papers)
  2. Andrew Gaut (3 papers)
  3. Jeff He (3 papers)
  4. David Lüdeke (2 papers)
  5. Kateryna Pistunova (11 papers)
Citations (2)