Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MolBind: Multimodal Alignment of Language, Molecules, and Proteins (2403.08167v2)

Published 13 Mar 2024 in cs.LG, cs.CL, and q-bio.QM

Abstract: Recent advancements in biology and chemistry have leveraged multi-modal learning, integrating molecules and their natural language descriptions to enhance drug discovery. However, current pre-training frameworks are limited to two modalities, and designing a unified network to process different modalities (e.g., natural language, 2D molecular graphs, 3D molecular conformations, and 3D proteins) remains challenging due to inherent gaps among them. In this work, we propose MolBind, a framework that trains encoders for multiple modalities through contrastive learning, mapping all modalities to a shared feature space for multi-modal semantic alignment. To facilitate effective pre-training of MolBind on multiple modalities, we also build and collect a high-quality dataset with four modalities, MolBind-M4, including graph-language, conformation-language, graph-conformation, and conformation-protein paired data. MolBind shows superior zero-shot learning performance across a wide range of tasks, demonstrating its strong capability of capturing the underlying semantics of multiple modalities.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Geom, energy-annotated molecular conformations for property prediction and molecular generation. Scientific Data, 9(1):185.
  2. Scibert: A pretrained language model for scientific text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3615–3620.
  3. An open source chemical structure curation pipeline using rdkit. Journal of Cheminformatics, 12:1–16.
  4. Text2mol: Cross-modal molecule retrieval with natural language queries. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 595–607.
  5. Henri A Favre and Warren H Powell. 2013. Nomenclature of organic chemistry: IUPAC recommendations and preferred names 2013. Royal Society of Chemistry.
  6. 3d convolutional neural networks and a crossdocked dataset for structure-based drug design. Journal of chemical information and modeling.
  7. Drugclip: Contrastive protein-molecule representation learning for virtual screening. In Thirty-seventh Conference on Neural Information Processing Systems.
  8. Imagebind: One embedding space to bind them all. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15180–15190.
  9. Audioclip: Extending clip to image, text and audio. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 976–980.
  10. Best of both worlds: Multimodal contrastive learning with tabular and imaging data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23924–23935.
  11. Glide: a new approach for rapid, accurate docking and scoring. 2. enrichment factors in database screening. Journal of medicinal chemistry, 47(7):1750–1759.
  12. Inchi, the iupac international chemical identifier. Journal of cheminformatics, pages 1–34.
  13. Strategies for pre-training graph neural networks. In International Conference on Learning Representations (ICLR).
  14. Pubchem 2023 update. Nucleic acids research, 51(D1):D1373–D1380.
  15. Pug-view: programmatic access to chemical annotations integrated in pubchem. Journal of cheminformatics, 11(1):1–11.
  16. Pocketome: an encyclopedia of small-molecule binding sites in 4d. Nucleic acids research, 40(D1):D535–D540.
  17. Chemical actinometry (iupac technical report). Pure and Applied Chemistry, 76(12):2105–2146.
  18. Towards 3d molecule-text interpretation in language models. arXiv preprint arXiv:2401.13923.
  19. Multi-modal molecule structure–text model for text-based retrieval and editing. Nature Machine Intelligence, 5(12):1447–1457.
  20. Molca: Molecular graph-language modeling with cross-modal projector and uni-modal adapter. arXiv preprint arXiv:2310.12798.
  21. A 3d generative model for structure-based drug design. Advances in Neural Information Processing Systems, 34:6229–6239.
  22. Maho Nakata and Tomomi Shimazaki. 2017. Pubchemqc project: a large-scale first-principles electronic structure database for data-driven chemistry. Journal of chemical information and modeling, pages 1300–1308.
  23. Pocket2mol: Efficient molecular sampling based on 3d protein pockets. In International Conference on Machine Learning, pages 17644–17655. PMLR.
  24. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763.
  25. Uff, a full periodic table force field for molecular mechanics and molecular dynamics simulations. Journal of the American chemical society, 114(25):10024–10035.
  26. Sereina Riniker and Gregory A Landrum. 2015. Better informed distance geometry: using what we know to improve conformation generation. Journal of chemical information and modeling, 55(12):2562–2574.
  27. Rotation invariant graph neural networks using spin convolutions. arXiv preprint arXiv:2106.09575.
  28. Teague Sterling and John J Irwin. 2015. Zinc 15–ligand discovery for everyone. Journal of chemical information and modeling, 55(11):2324–2337.
  29. A molecular multimodal foundation model associating molecule graphs with natural language. arXiv preprint arXiv:2209.05481.
  30. Alphafold protein structure database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic acids research, 50(D1):D439–D444.
  31. Towards equilibrium molecular conformation generation with gflownets. arXiv preprint arXiv:2310.14782.
  32. The pdbbind database: methodologies and updates. Journal of medicinal chemistry, 48(12):4111–4119.
  33. Decoupled self-supervised learning for graphs. Advances in Neural Information Processing Systems, 35:620–634.
  34. Learning how to propagate messages in graph neural networks. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 1894–1903.
  35. Videoclip: Contrastive pre-training for zero-shot video-text understanding. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6787–6800.
  36. Molecule3d: A benchmark for predicting 3d geometries from molecular graphs. arXiv preprint arXiv:2110.01717.
  37. 3d molecular geometry analysis with 2d graphs. arXiv preprint arXiv:2305.13315.
  38. Graph contrastive learning with augmentations. Advances in neural information processing systems, 33:5812–5823.
  39. A deep-learning system bridging molecule structure and biomedical text with comprehension comparable to human professionals. Nature communications, 13(1):862.
  40. Pointclip: Point cloud understanding by clip. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8552–8562.
  41. Uni-mol: A universal 3d molecular representation learning framework. In The Eleventh International Conference on Learning Representations.
  42. Languagebind: Extending video-language pretraining to n-modality by language-based semantic alignment. arXiv preprint arXiv:2310.01852.
  43. 3m-diffusion: Latent multi-modal diffusion for text-guided generation of molecular graphs. arXiv preprint arXiv:2403.07179.
Citations (5)

Summary

We haven't generated a summary for this paper yet.