Emergent Mind

Benchmarking Large Language Models for Molecule Prediction Tasks

(2403.05075)
Published Mar 8, 2024 in cs.LG and q-bio.BM

Abstract

LLMs stand at the forefront of a number of NLP tasks. Despite the widespread adoption of LLMs in NLP, much of their potential in broader fields remains largely unexplored, and significant limitations persist in their design and implementation. Notably, LLMs struggle with structured data, such as graphs, and often falter when tasked with answering domain-specific questions requiring deep expertise, such as those in biology and chemistry. In this paper, we explore a fundamental question: Can LLMs effectively handle molecule prediction tasks? Rather than pursuing top-tier performance, our goal is to assess how LLMs can contribute to diverse molecule tasks. We identify several classification and regression prediction tasks across six standard molecule datasets. Subsequently, we carefully design a set of prompts to query LLMs on these tasks and compare their performance with existing Machine Learning (ML) models, which include text-based models and those specifically designed for analysing the geometric structure of molecules. Our investigation reveals several key insights: Firstly, LLMs generally lag behind ML models in achieving competitive performance on molecule tasks, particularly when compared to models adept at capturing the geometric structure of molecules, highlighting the constrained ability of LLMs to comprehend graph data. Secondly, LLMs show promise in enhancing the performance of ML models when used collaboratively. Lastly, we engage in a discourse regarding the challenges and promising avenues to harness LLMs for molecule prediction tasks. The code and models are available at https://github.com/zhiqiangzhongddu/LLMaMol.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Sign up for a free account or log in to generate a summary of this paper:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

References
  1. Llama 2: Open Foundation and Fine-Tuned Chat Models
  2. Graph of Thoughts: Solving Elaborate Problems with Large Language Models
  3. Autonomous chemical research with large language models. Nature 624, 7992 (2023), 570–578.
  4. Language Models are Few-Shot Learners. In Proceedings of the 2020 Annual Conference on Neural Information Processing Systems (NeurIPS). 1877–1901.
  5. Sparks of Artificial General Intelligence: Early experiments with GPT-4
  6. Machine learning for molecular and materials science. Nature 559, 7715 (2018), 547–555.
  7. A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology (TOST) (2023).
  8. Scaling Instruction-Finetuned Language Models
  9. Transformers as Soft Reasoners over Language
  10. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
  11. Machine learning in finance. Vol. 1170. Springer.
  12. Talk like a Graph: Encoding Graphs for Large Language Models
  13. Mathematical capabilities of chatgpt. In Proceedings of the 2023 Annual Conference on Neural Information Processing Systems (NeurIPS).
  14. Large language models are zero-shot time series forecasters. In Proceedings of the 2023 Annual Conference on Neural Information Processing Systems (NeurIPS).
  15. GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and Benchmarking
  16. What can large language models do in chemistry? a comprehensive benchmark on eight tasks. In Proceedings of the 2023 Annual Conference on Neural Information Processing Systems (NeurIPS).
  17. DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing
  18. How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation
  19. Jeremy Howard and Sebastian Ruder. 2018. Universal language model fine-tuning for text classification. CoRR abs/1801.0614 (2018).
  20. LoRA: Low-Rank Adaptation of Large Language Models
  21. OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs. In Proceedings of the 2021 Annual Conference on Neural Information Processing Systems (NeurIPS).
  22. Open Graph Benchmark: Datasets for Machine Learning on Graphs. In Proceedings of the 2020 Annual Conference on Neural Information Processing Systems (NeurIPS). 22118–22133.
  23. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
  24. Leveraging large language models for predictive chemistry. Nature Machine Intelligence (2024), 1–9.
  25. Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the 2017 International Conference on Learning Representations (ICLR).
  26. Post Hoc Explanations of Language Models Can Improve Language Models
  27. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Proceedings of the 2020 Annual Conference on Neural Information Processing Systems (NeurIPS). 9459–9474.
  28. A Survey of Graph Meets Large Language Model: Progress and Future Directions
  29. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys (CSUR) 55, 9 (2023), 1–35.
  30. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Research 47, D1 (2019), D930–D940.
  31. Capabilities of GPT-4 on Medical Challenge Problems
  32. OpenAI. 2023. GPT-4. https://openai.com/research/gpt-4

  33. Training language models to follow instructions with human feedback. In Proceedings of the 2020 Annual Conference on Neural Information Processing Systems (NeurIPS). 27730–27744.
  34. Can Large Language Models Empower Molecular Property Prediction?
  35. Galactica: A Large Language Model for Science
  36. Gemini: A Family of Highly Capable Multimodal Models
  37. Llama 2: Open Foundation and Fine-Tuned Chat Models
  38. Attention is All you Need. In Proceedings of the 2017 Annual Conference on Neural Information Processing Systems (NIPS). 5998–6008.
  39. Emergent Abilities of Large Language Models
  40. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Proceedings of the 2022 Annual Conference on Neural Information Processing Systems (NeurIPS). 24824–24837.
  41. David Weininger. 1988. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of Chemical Information and Computer Sciences 28, 1 (1988), 31–36.
  42. MoleculeNet: a benchmark for molecular machine learning. Chemical Science 9, 2 (2018), 513–530.
  43. How Powerful are Graph Neural Networks?. In Proceedings of the 2019 International Conference on Learning Representations (ICLR).
  44. Tree of Thoughts: Deliberate Problem Solving with Large Language Models
  45. An Evaluation of Large Language Models in Bioinformatics Research
  46. Scientific Large Language Models: A Survey on Biological & Chemical Domains
  47. Deep Learning on Graphs: A Survey. IEEE Transactions on Knowledge and Data Engineering (TKDE) 34, 1 (2020), 249–270.
  48. A Survey of Large Language Models
  49. Knowledge-augmented Graph Machine Learning for Drug Discovery: A Survey from Precision to Interpretability
  50. Harnessing Large Language Models as Post-hoc Correctors

Show All 50