LICO: Large Language Models for In-Context Molecular Optimization (2406.18851v1)

Published 27 Jun 2024 in cs.LG, cs.AI, physics.chem-ph, q-bio.BM, and q-bio.QM

Abstract: Optimizing black-box functions is a fundamental problem in science and engineering. To solve this problem, many approaches learn a surrogate function that estimates the underlying objective from limited historical evaluations. LLMs, with their strong pattern-matching capabilities via pretraining on vast amounts of data, stand out as a potential candidate for surrogate modeling. However, directly prompting a pretrained LLM to produce predictions is not feasible in many scientific domains due to the scarcity of domain-specific data in the pretraining corpora and the challenges of articulating complex problems in natural language. In this work, we introduce LICO, a general-purpose model that extends arbitrary base LLMs for black-box optimization, with a particular application to the molecular domain. To achieve this, we equip the LLM with a separate embedding layer and prediction layer, and train the model to perform in-context predictions on a diverse set of functions defined over the domain. Once trained, LICO can generalize to unseen molecule properties simply via in-context prompting. LICO achieves state-of-the-art performance on PMO, a challenging molecular optimization benchmark comprising over 20 objective functions.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

Authors (2)

Tung Nguyen (58 papers)
Aditya Grover (82 papers)

Citations (3)

View on Semantic Scholar

Tweets

https://twitter.com/tungnd_13/status/1807805111470629277

LICO: Large Language Models for In-Context Molecular Optimization (2406.18851v1)

Related Papers

Tweets