SmileyLlama: Modifying Large Language Models for Directed Chemical Space Exploration (2409.02231v3)

Published 3 Sep 2024 in physics.chem-ph and cs.LG

Abstract: Here we show that a general-purpose LLM chatbot, Llama-3.1-8B-Instruct, can be transformed via supervised fine-tuning of engineered prompts into a chemical LLM (CLM), SmileyLlama, for molecule generation. We benchmark SmileyLlama by comparing it to CLMs trained from scratch on large amounts of ChEMBL data for their ability to generate valid and novel drug-like molecules. We also use direct preference optimization to both improve SmileyLlama's adherence to a prompt and to generate molecules within the iMiner reinforcement learning framework to predict new drug molecules with optimized 3D conformations and high binding affinity to drug targets, illustrated with the SARS-Cov-2 Main Protease. This overall framework allows a LLM to speak directly as a CLM which can generate molecules with user-specified properties, rather than acting only as a chatbot with knowledge of chemistry or as a helpful virtual assistant. While our dataset and analyses are geared toward drug discovery, this general procedure can be extended to other chemical applications such as chemical synthesis.

References (65)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (7)

Tweets

https://twitter.com/skylerrosling/status/1833251072582029521

SmileyLlama: Modifying Large Language Models for Directed Chemical Space Exploration (2409.02231v3)

Summary

Follow-up Questions

Related Papers

Authors (7)

Tweets