Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model (2403.13244v4)

Published 20 Mar 2024 in cs.CL and cs.AI

Abstract: While various models and computational tools have been proposed for structure and property analysis of molecules, generating molecules that conform to all desired structures and properties remains a challenge. Here, we introduce a multi-constraint molecular generation LLM, TSMMG, which, akin to a student, incorporates knowledge from various small models and tools, namely, the 'teachers'. To train TSMMG, we construct a large set of text-molecule pairs by extracting molecular knowledge from these 'teachers', enabling it to generate novel molecules that conform to the descriptions through various text prompts. We experimentally show that TSMMG remarkably performs in generating molecules meeting complex, natural language-described property requirements across two-, three-, and four-constraint tasks, with an average molecular validity of over 99% and success ratio of 82.58%, 68.03%, and 67.48%, respectively. The model also exhibits adaptability through zero-shot testing, creating molecules that satisfy combinations of properties that have not been encountered. It can comprehend text inputs with various language styles, extending beyond the confines of outlined prompts, as confirmed through empirical validation. Additionally, the knowledge distillation feature of TSMMG contributes to the continuous enhancement of small models, while the innovative approach to dataset construction effectively addresses the issues of data scarcity and quality, which positions TSMMG as a promising tool in the domains of drug discovery and materials science.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (14)
  1. Peng Zhou (136 papers)
  2. Jianmin Wang (119 papers)
  3. Chunyan Li (24 papers)
  4. Zixu Wang (26 papers)
  5. Yiping Liu (20 papers)
  6. Siqi Sun (46 papers)
  7. Jianxin Lin (26 papers)
  8. Longyue Wang (87 papers)
  9. Xiangxiang Zeng (28 papers)
  10. Leyi Wei (11 papers)
  11. Xibao Cai (3 papers)
  12. Houtim Lai (4 papers)
  13. Wei Liu (1135 papers)
  14. Yuansheng Liu (11 papers)