Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
116 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
55 tokens/sec
2000 character limit reached

SmileyLlama: Modifying Large Language Models for Directed Chemical Space Exploration (2409.02231v3)

Published 3 Sep 2024 in physics.chem-ph and cs.LG

Abstract: Here we show that a general-purpose LLM chatbot, Llama-3.1-8B-Instruct, can be transformed via supervised fine-tuning of engineered prompts into a chemical LLM (CLM), SmileyLlama, for molecule generation. We benchmark SmileyLlama by comparing it to CLMs trained from scratch on large amounts of ChEMBL data for their ability to generate valid and novel drug-like molecules. We also use direct preference optimization to both improve SmileyLlama's adherence to a prompt and to generate molecules within the iMiner reinforcement learning framework to predict new drug molecules with optimized 3D conformations and high binding affinity to drug targets, illustrated with the SARS-Cov-2 Main Protease. This overall framework allows a LLM to speak directly as a CLM which can generate molecules with user-specified properties, rather than acting only as a chatbot with knowledge of chemistry or as a helpful virtual assistant. While our dataset and analyses are geared toward drug discovery, this general procedure can be extended to other chemical applications such as chemical synthesis.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. Krenn, M.; Hase, F.; Nigam, A.; Friederich, P.; Aspuru-Guzik, A. Self-referencing embedded strings (SELFIES): A 100.
  2. Gupta, A.; Müller, A. T.; Huisman, B. J. H.; Fuchs, J. A.; Schneider, P.; Schneider, G. Generative Recurrent Networks for De Novo Drug Design. Molecular Informatics 2017, 37.
  3. Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. 2018; https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf.
  4. Gu, A.; Goel, K.; Ré, C. Efficiently Modeling Long Sequences with Structured State Spaces. 2022; https://arxiv.org/abs/2111.00396.
  5. Li, J.; Zhang, O.; Sun, K.; Wang, Y.; Guan, X.; Bagni, D.; Haghighatlari, M.; Kearns, F. L.; Parks, C.; Amaro, R. E.; Head-Gordon, T. Mining for Potent Inhibitors through Artificial Intelligence and Physics: A Unified Methodology for Ligand Based and Structure Based Drug Design. Journal of Chemical Information and Modeling 2024,
  6. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. 2023; https://arxiv.org/abs/1706.03762.
  7. Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language Models are Unsupervised Multitask Learners. 2018,
  8. OpenAI et al. GPT-4 Technical Report. http://arxiv.org/abs/2303.08774.
  9. Dubey, A. et al. The Llama 3 Herd of Models. http://arxiv.org/abs/2407.21783.
  10. Brown, T. B. et al. Language Models are Few-Shot Learners. http://arxiv.org/abs/2005.14165.
  11. Jain, S.; Kirk, R.; Lubana, E. S.; Dick, R. P.; Tanaka, H.; Grefenstette, E.; Rocktäschel, T.; Krueger, D. S. Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks. 2023.
  12. Ahmadian, A.; Cremer, C.; Gallé, M.; Fadaee, M.; Kreutzer, J.; Pietquin, O.; Üstün, A.; Hooker, S. Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs. 2024; https://arxiv.org/abs/2402.14740.
  13. Ziegler, D. M.; Stiennon, N.; Wu, J.; Brown, T. B.; Radford, A.; Amodei, D.; Christiano, P.; Irving, G. Fine-Tuning Language Models from Human Preferences. http://arxiv.org/abs/1909.08593.
  14. Christiano, P.; Leike, J.; Brown, T. B.; Martic, M.; Legg, S.; Amodei, D. Deep reinforcement learning from human preferences. 2023; https://arxiv.org/abs/1706.03741.
  15. Rafailov, R.; Sharma, A.; Mitchell, E.; Ermon, S.; Manning, C. D.; Finn, C. Direct Preference Optimization: Your Language Model is Secretly a Reward Model. http://arxiv.org/abs/2305.18290.
  16. Templeton, A. et al. Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet. Transformer Circuits Thread 2024,
  17. Turner, A. M.; Thiergart, L.; Leech, G.; Udell, D.; Vazquez, J. J.; Mini, U.; MacDiarmid, M. Activation Addition: Steering Language Models Without Optimization. 2024; https://arxiv.org/abs/2308.10248.
  18. Mack, A.; Turner, A. Mechanistically Eliciting Latent Behaviors in Language Models. AI Alignment Forum 2024, https://www.alignmentforum.org/posts/ioPnHKFyy4Cw2Gr2x/mechanistically-eliciting-latent-behaviors-in-language-1.
  19. Bricken, T.; Templeton, A.; Batson, J.; Chen, B.; Jermyn, A.; Conerly, T.; Turner, N.; Anil, C.; Denison, C.; Askell, A.; others Towards monosemanticity: Decomposing language models with dictionary learning. Transformer Circuits Thread 2023, 2.
  20. Cunningham, H.; Ewart, A.; Riggs, L.; Huben, R.; Sharkey, L. Sparse Autoencoders Find Highly Interpretable Features in Language Models. 2023; https://arxiv.org/abs/2309.08600.
  21. Gaulton, A.; Bellis, L. J.; Bento, A. P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B.; Overington, J. P. ChEMBL: a large-scale bioactivity database for drug discovery. 40, D1100–D1107.
  22. Landrum, G. RDKit: Open-Source Cheminformatics Software. 2016,
  23. Huang, K.; Fu, T.; Gao, W.; Zhao, Y.; Roohani, Y.; Leskovec, J.; Coley, C. W.; Xiao, C.; Sun, J.; Zitnik, M. Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development. Proceedings of Neural Information Processing Systems, NeurIPS Datasets and Benchmarks 2021,
  24. Huang, K.; Fu, T.; Gao, W.; Zhao, Y.; Roohani, Y.; Leskovec, J.; Coley, C. W.; Xiao, C.; Sun, J.; Zitnik, M. Artificial intelligence foundation for therapeutic science. Nature Chemical Biology 2022,
  25. Velez-Arce, A.; Huang, K.; Li, M.; Lin, X.; Gao, W.; Fu, T.; Kellis, M.; Pentelute, B. L.; Zitnik, M. TDC-2: Multimodal Foundation for Therapeutic Science. bioRxiv 2024,
  26. Lian, W. axolotl. URL https://github.com/axolotl-ai-cloud/axolotl/tree/main. https://github.com/axolotl-ai-cloud/axolotl/tree/main.
  27. Hu, E. J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. LoRA: Low-Rank Adaptation of Large Language Models. http://arxiv.org/abs/2106.09685.
  28. Dao, T.; Fu, D. Y.; Ermon, S.; Rudra, A.; Ré, C. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. Advances in Neural Information Processing Systems (NeurIPS). 2022.
  29. Dao, T. FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning. International Conference on Learning Representations (ICLR). 2024.
  30. Kingma, D. P.; Ba, J. Adam: A Method for Stochastic Optimization. http://arxiv.org/abs/1412.6980.
  31. Rajbhandari, S.; Rasley, J.; Ruwase, O.; He, Y. ZeRO: memory optimizations toward training trillion parameter models. arXiv preprint arXiv:1910.02054 2019,
  32. Rasley, J.; Rajbhandari, S.; Ruwase, O.; He, Y. DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020.
  33. Zhang, M.; He, Y. Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping. Advances in Neural Information Processing Systems. 2020.
  34. Ren, J.; Rajbhandari, S.; Aminabadi, R. Y.; Ruwase, O.; Yang, S.; Zhang, M.; Li, D.; He, Y. ZeRO-Offload: Democratizing Billion-Scale Model Training. arXiv preprint arXiv:2101.06840 2021,
  35. Tang, H.; Gan, S.; Awan, A. A.; Rajbhandari, S.; Li, C.; Lian, X.; Liu, J.; Zhang, C.; He, Y. 1-bit Adam: Communication Efficient Large-Scale Training with Adam’s Convergence Speed. International Conference on Machine Learning. 2021.
  36. Rajbhandari, S.; Ruwase, O.; Rasley, J.; Smith, S.; He, Y. ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning. arXiv preprint arXiv:2104.07857 2021,
  37. Li, C.; Zhang, M.; He, Y. The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models. Advances in Neural Information Processing Systems. 2022.
  38. Lu, Y.; Li, C.; Zhang, M.; De Sa, C.; He, Y. Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam. arXiv preprint arXiv:2202.06009 2022,
  39. Rajbhandari, S.; Li, C.; Yao, Z.; Zhang, M.; Aminabadi, R. Y.; Awan, A. A.; Rasley, J.; He, Y. DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale. International Conference on Machine Learning. 2022.
  40. Smith, S.; Patwary, M.; Norick, B.; LeGresley, P.; Rajbhandari, S.; Casper, J.; Liu, Z.; Prabhumoye, S.; Zerveas, G.; Korthikanti, V.; others Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model. arXiv preprint arXiv:2201.11990 2022,
  41. Wu, X.; Yao, Z.; Zhang, M.; Li, C.; He, Y. Extreme Compression for Pre-trained Transformers Made Simple and Efficient. Advances in Neural Information Processing Systems. 2022.
  42. Yao, Z.; Aminabadi, R. Y.; Zhang, M.; Wu, X.; Li, C.; He, Y. ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers. Advances in Neural Information Processing Systems. 2022.
  43. Yao, Z.; Wu, X.; Li, C.; Holmes, C.; Zhang, M.; Li, C.; He, Y. Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers. arXiv preprint arXiv:2211.11586 2022,
  44. Li, C.; Yao, Z.; Wu, X.; Zhang, M.; He, Y. DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing. arXiv preprint arXiv:2212.03597 2022,
  45. Zawad, S.; Li, C.; Yao, Z.; Zheng, E.; He, Y.; Yan, F. DySR: Adaptive Super-Resolution via Algorithm and System Co-design. International Conference on Learning Representations. 2023.
  46. Shen, S.; Yao, Z.; Li, C.; Darrell, T.; Keutzer, K.; He, Y. Scaling Vision-Language Models with Sparse Mixture of Experts. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023.
  47. Anthony, Q.; Awan, A. A.; Rasley, J.; He, Y.; Shafi, A.; Abduljabbar, M.; Subramoni, H.; Panda, D. MCR-DL: Mix-and-Match Communication Runtime for Deep Learning. IEEE International Parallel and Distributed Processing Symposium. 2023.
  48. Singh, S.; Ruwase, O.; Awan, A. A.; Rajbhandari, S.; He, Y.; Bhatele, A. A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training. Proceedings of the 37th International Conference on Supercomputing. 2023.
  49. Wang, G.; Qin, H.; Jacobs, S. A.; Wu, X.; Holmes, C.; Yao, Z.; Rajbhandari, S.; Ruwase, O.; Yan, F.; Yang, L.; He, Y. ZeRO++: Extremely Efficient Collective Communication for Giant Model Training. arXiv preprint arXiv:2306.10209 2023,
  50. Golnari, P. A.; Yao, Z.; He, Y. Selective Guidance: Are All the Denoising Steps of Guided Diffusion Important? arXiv preprint arXiv:2305.09847 2023,
  51. Wu, X.; Yao, Z.; He, Y. ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats. arXiv preprint arXiv:2307.09782 2023,
  52. Yao, Z.; Aminabadi, R. Y.; Ruwase, O.; Rajbhandari, S.; Wu, X.; Awan, A. A.; Rasley, J.; Zhang, M.; Li, C.; Holmes, C.; others DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales. arXiv preprint arXiv:2308.01320 2023,
  53. Song, S. L.; Kruft, B.; Zhang, M.; Li, C.; Chen, S.; Zhang, C.; Tanaka, M.; Wu, X.; Rasley, J.; Awan, A. A.; others DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies. arXiv preprint arXiv:2310.04610 2023,
  54. Yao, Z.; Wu, X.; Li, C.; Youn, S.; He, Y. ZeroQuant-V2: Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation. arXiv preprint arXiv:2303.08302 2023,
  55. Xia, H.; Zheng, Z.; Wu, X.; Chen, S.; Yao, Z.; Youn, S.; Bakhtiari, A.; Wyatt, M.; Zhuang, D.; Zhou, Z.; Ruwase, O.; He, Y.; Song, S. L. FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design. arXiv preprint arXiv:2401.14112 2024,
  56. Jacobs, S. A.; Tanaka, M.; Zhang, C.; Zhang, M.; Aminadabi, R. Y.; Song, S. L.; Rajbhandari, S.; He, Y. System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models. arXiv preprint 2024,
  57. Lian, X.; Jacobs, S. A.; Kurilenko, L.; Tanaka, M.; Bekman, S.; Ruwase, O.; Zhang, M. Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training. arXiv preprint arXiv:2406.18820 2024,
  58. Enamine Essential Fragment Library. https://enamine.net/compound-libraries/fragment-libraries/essential-library, Accessed: 2024-08-23.
  59. Park, R.; Theisen, R.; Sahni, N.; Patek, M.; Cichońska, A.; Rahman, R. Preference Optimization for Molecular Language Models. http://arxiv.org/abs/2310.12304.
  60. Jin, W.; Barzilay, R.; Jaakkola, T. Multi-Objective Molecule Generation using Interpretable Substructures. 2020; https://arxiv.org/abs/2002.03244.
  61. Edwards, C.; Lai, T.; Ros, K.; Honke, G.; Cho, K.; Ji, H. Translation between Molecules and Natural Language. 2022; http://arxiv.org/abs/2204.11817, arXiv:2204.11817 [cs].
  62. Wang, H.; Skreta, M.; Ser, C.-T.; Gao, W.; Kong, L.; Strieth-Kalthoff, F.; Duan, C.; Zhuang, Y.; Yu, Y.; Zhu, Y.; Du, Y.; Aspuru-Guzik, A.; Neklyudov, K.; Zhang, C. Efficient Evolutionary Search Over Chemical Space with Large Language Models. 2024; http://arxiv.org/abs/2406.16976, arXiv:2406.16976 [physics].
  63. Guevorguian, P.; Bedrosian, M.; Fahradyan, T.; Chilingaryan, G.; Khachatrian, H.; Aghajanyan, A. Small Molecule Optimization with Large Language Models. http://arxiv.org/abs/2407.18897, version: 1.
  64. Bhattacharya, D.; Cassady, H.; Hickner, M.; Reinhart, W. Large Language Models as Molecular Design Engines. 2024; https://chemrxiv.org/engage/chemrxiv/article-details/664c98ea418a5379b0e07d31.
  65. Liu, X.; Guo, Y.; Li, H.; Liu, J.; Huang, S.; Ke, B.; Lv, J. DrugLLM: Open Large Language Model for Few-shot Molecule Generation. ArXiv 2024,
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com