Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Sober Look at LLMs for Material Discovery: Are They Actually Good for Bayesian Optimization Over Molecules? (2402.05015v2)

Published 7 Feb 2024 in cs.LG

Abstract: Automation is one of the cornerstones of contemporary material discovery. Bayesian optimization (BO) is an essential part of such workflows, enabling scientists to leverage prior domain knowledge into efficient exploration of a large molecular space. While such prior knowledge can take many forms, there has been significant fanfare around the ancillary scientific knowledge encapsulated in LLMs. However, existing work thus far has only explored LLMs for heuristic materials searches. Indeed, recent work obtains the uncertainty estimate -- an integral part of BO -- from point-estimated, non-Bayesian LLMs. In this work, we study the question of whether LLMs are actually useful to accelerate principled Bayesian optimization in the molecular space. We take a sober, dispassionate stance in answering this question. This is done by carefully (i) viewing LLMs as fixed feature extractors for standard but principled BO surrogate models and by (ii) leveraging parameter-efficient finetuning methods and Bayesian neural networks to obtain the posterior of the LLM surrogate. Our extensive experiments with real-world chemistry problems show that LLMs can be useful for BO over molecules, but only if they have been pretrained or finetuned with domain-specific data.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Agustinus Kristiadi (28 papers)
  2. Felix Strieth-Kalthoff (8 papers)
  3. Marta Skreta (12 papers)
  4. Pascal Poupart (80 papers)
  5. Alán Aspuru-Guzik (226 papers)
  6. Geoff Pleiss (41 papers)
Citations (9)
X Twitter Logo Streamline Icon: https://streamlinehq.com