Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards 3D Molecule-Text Interpretation in Language Models (2401.13923v2)

Published 25 Jan 2024 in cs.LG, cs.IR, and q-bio.BM

Abstract: LLMs (LMs) have greatly influenced diverse domains. However, their inherent limitation in comprehending 3D molecular structures has considerably constrained their potential in the biomolecular domain. To bridge this gap, we focus on 3D molecule-text interpretation, and propose 3D-MoLM: 3D-Molecular LLMing. Specifically, 3D-MoLM enables an LM to interpret and analyze 3D molecules by equipping the LM with a 3D molecular encoder. This integration is achieved by a 3D molecule-text projector, bridging the 3D molecular encoder's representation space and the LM's input space. Moreover, to enhance 3D-MoLM's ability of cross-modal molecular understanding and instruction following, we meticulously curated a 3D molecule-centric instruction tuning dataset -- 3D-MoIT. Through 3D molecule-text alignment and 3D molecule-centric instruction tuning, 3D-MoLM establishes an integration of 3D molecular encoder and LM. It significantly surpasses existing baselines on downstream tasks, including molecule-text retrieval, molecule captioning, and more challenging open-text molecular QA tasks, especially focusing on 3D-dependent properties. We release our codes and datasets at https://github.com/lsh0520/3D-MoLM.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Flamingo: a visual language model for few-shot learning. In NeurIPS, 2022.
  2. Scibert: A pretrained language model for scientific text. In EMNLP/IJCNLP (1), pp.  3613–3618. Association for Computational Linguistics, 2019.
  3. Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  3558–3568, 2021.
  4. Scaling instruction-finetuned language models. CoRR, abs/2210.11416, 2022.
  5. Instructblip: Towards general-purpose vision-language models with instruction tuning. In NeurIPS, 2023.
  6. BERT: pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT (1), pp.  4171–4186. Association for Computational Linguistics, 2019.
  7. Palm-e: An embodied multimodal language model. In ICML, 2023.
  8. Text2mol: Cross-modal molecule retrieval with natural language queries. In EMNLP (1), pp.  595–607. Association for Computational Linguistics, 2021.
  9. Translation between molecules and natural language. In EMNLP, pp.  375–413. Association for Computational Linguistics, 2022.
  10. Geometry-enhanced molecular representation learning for property prediction. Nat. Mach. Intell., 4(2):127–134, 2022.
  11. Mol-instructions: A large-scale biomolecular instruction dataset for large language models. arXiv preprint arXiv:2306.08018, 2023.
  12. Nomenclature of organic chemistry: IUPAC recommendations and preferred names 2013. Royal Society of Chemistry, 2013.
  13. 3d-llm: Injecting the 3d world into large language models. arXiv preprint arXiv:2307.12981, 2023.
  14. Lora: Low-rank adaptation of large language models. CoRR, abs/2106.09685, 2021.
  15. Robert O Jones. Density functional theory: Its origins, rise to prominence, and future. Reviews of modern physics, 87(3):897, 2015.
  16. William L Jorgensen. The many roles of computation in drug discovery. Science, 303(5665):1813–1818, 2004.
  17. Molecular dynamics simulations of biomolecules. Nature structural biology, 9(9):646–652, 2002.
  18. Pubchem in 2021: new data content and improved web interfaces. Nucleic Acids Res., 49(Database-Issue):D1388–D1395, 2021.
  19. Self-referencing embedded strings (SELFIES): A 100% robust molecular string representation. Mach. Learn. Sci. Technol., 1(4):45024, 2020.
  20. Greg Landrum et al. Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum, 8:31, 2013.
  21. BLIP-2: bootstrapping language-image pre-training with frozen image encoders and large language models. CoRR, abs/2301.12597, 2023.
  22. Prefix-tuning: Optimizing continuous prompts for generation. In ACL/IJCNLP (1), pp.  4582–4597. Association for Computational Linguistics, 2021.
  23. Supervision exists everywhere: A data efficient contrastive language-image pre-training paradigm. In ICLR. OpenReview.net, 2022.
  24. Visual instruction tuning. arXiv preprint arXiv:2304.08485, 2023.
  25. Multi-modal molecule structure-text model for text-based retrieval and editing. CoRR, abs/2212.10789, 2022a.
  26. Pre-training molecular graph representation with 3d geometry. In ICLR. OpenReview.net, 2022b.
  27. The flan collection: Designing data and methods for effective instruction tuning. arXiv preprint arXiv:2301.13688, 2023.
  28. Decoupled weight decay regularization. In International Conference on Learning Representations, 2018.
  29. Highly accurate quantum chemical property prediction with uni-mol+. CoRR, abs/2303.16982, 2023.
  30. Macaw-llm: Multi-modal language modeling with image, audio, video, and text integration. arXiv, 2023.
  31. Physical chemistry: a molecular approach, volume 1. University science books Sausalito, CA, 1997.
  32. Maho Nakata. The pubchemqc project: a large chemical database from the first principle calculations, 2015.
  33. OpenAI. Chatgpt, 2023a. URL https://openai.com/blog/chatgpt.
  34. OpenAI. GPT-4 technical report. CoRR, abs/2303.08774, 2023b.
  35. Learning transferable visual models from natural language supervision. In ICML, volume 139 of Proceedings of Machine Learning Research, pp.  8748–8763. PMLR, 2021.
  36. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21:140:1–140:67, 2020.
  37. A molecular multimodal foundation model associating molecule graphs with natural language. CoRR, abs/2209.05481, 2022.
  38. Galactica: A large language model for science. CoRR, abs/2211.09085, 2022.
  39. Llama: Open and efficient foundation language models. CoRR, abs/2302.13971, 2023a.
  40. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023b.
  41. Multimodal few-shot learning with frozen language models. In NeurIPS, pp.  200–212, 2021.
  42. Inferring experimental procedures from text-based representations of chemical reactions. Nature communications, 12(1):2573, 2021.
  43. Pointllm: Empowering large language models to understand point clouds. arXiv preprint arXiv:2308.16911, 2023.
  44. Molecule3d: A benchmark for predicting 3d geometries from molecular graphs, 2021.
  45. A deep-learning system bridging molecule structure and biomedical text with comprehension comparable to human professionals. Nature communications, 13(862), 2022.
  46. Video-llama: An instruction-tuned audio-visual language model for video understanding. arXiv, 2023.
  47. A survey of large language models. arXiv preprint arXiv:2303.18223, 2023.
  48. Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science, 13(31):9023–9034, 2022.
  49. Uni-mol: A universal 3d molecular representation learning framework. In ICLR. OpenReview.net, 2023.
  50. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Sihang Li (32 papers)
  2. Zhiyuan Liu (433 papers)
  3. Yanchen Luo (6 papers)
  4. Xiang Wang (279 papers)
  5. Xiangnan He (200 papers)
  6. Kenji Kawaguchi (147 papers)
  7. Tat-Seng Chua (359 papers)
  8. Qi Tian (314 papers)
Citations (26)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub

X Twitter Logo Streamline Icon: https://streamlinehq.com