Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Scientific Language Modeling: A Quantitative Review of Large Language Models in Molecular Science (2402.04119v1)

Published 6 Feb 2024 in cs.LG and cs.CE

Abstract: Efficient molecular modeling and design are crucial for the discovery and exploration of novel molecules, and the incorporation of deep learning methods has revolutionized this field. In particular, LLMs offer a fresh approach to tackle scientific problems from a NLP perspective, introducing a research paradigm called scientific LLMing (SLM). However, two key issues remain: how to quantify the match between model and data modalities and how to identify the knowledge-learning preferences of models. To address these challenges, we propose a multi-modal benchmark, named ChEBI-20-MM, and perform 1263 experiments to assess the model's compatibility with data modalities and knowledge acquisition. Through the modal transition probability matrix, we provide insights into the most suitable modalities for tasks. Furthermore, we introduce a statistically interpretable approach to discover context-specific knowledge mapping by localized feature filtering. Our pioneering analysis offers an exploration of the learning mechanism and paves the way for advancing SLM in molecular science.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Pengfei Liu (191 papers)
  2. Jun Tao (73 papers)
  3. Zhixiang Ren (23 papers)
Citations (3)
Github Logo Streamline Icon: https://streamlinehq.com