GOLLuM: Gaussian Process Optimized LLMs -- Reframing LLM Finetuning through Bayesian Optimization (2504.06265v2)

Published 8 Apr 2025 in cs.LG and cs.AI

Abstract: LLMs can encode complex relationships in their latent spaces, yet harnessing them for optimization under uncertainty remains challenging. We address this gap with a novel architecture that reframes LLM finetuning as Gaussian process (GP) marginal likelihood optimization via deep kernel methods. We introduce LLM-based deep kernels, jointly optimized with GPs to preserve the benefits of both - LLMs to provide a rich and flexible input space for Bayesian optimization and - GPs to model this space with predictive uncertainty for more efficient sampling. Applied to Buchwald-Hartwig reaction optimization, our method nearly doubles the discovery rate of high-performing reactions compared to static LLM embeddings (from 24% to 43% coverage of the top 5% reactions in just 50 optimization iterations). We also observe a 14% improvement over domain-specific representations without requiring specialized features. Extensive empirical evaluation across 19 benchmarks - ranging from general chemistry to reaction and molecular property optimization - demonstrates our method's robustness, generality, and consistent improvements across: (1) tasks, (2) LLM architectures (encoder, decoder, encoder-decoder), (3) pretraining domains (chemistry-related or general-purpose) and (4) hyperparameter settings (tuned once on a single dataset). Finally, we explain these improvements: joint LLM-GP optimization through marginal likelihood implicitly performs contrastive learning, aligning representations to produce (1) better-structured embedding spaces, (2) improved uncertainty calibration, and (3) more efficient sampling - without requiring any external loss. This work provides both practical advances in sample-efficient optimization and insights into what makes effective Bayesian optimization.

Collections

Summary

Gaussian Process Optimized LLMs: A Bayesian Approach to Fine-tuning

The paper "GOLLuM: Gaussian Process Optimized LLMs – Reframing LLM Finetuning through Bayesian Optimization" by Bojana Ranković et al. presents a novel architecture that exploits the capabilities of LLMs by integrating them with Gaussian Processes (GPs) for improved finetuning. The research is innovative in its approach to unify LLM finetuning with GP marginal likelihood optimization, leveraging deep kernel methods to create a cohesive system that advances Bayesian optimization.

LLMs, known for capturing intricate relationships in data through their extensive training, face challenges in managing optimization under uncertainty. This paper addresses these issues by introducing a methodology where LLMs generate deep kernels that, when combined with GPs, maintain a predictive uncertainty. This results in a more strategic sampling approach, vital for practical applications like chemical reaction optimizations.

The researchers successfully demonstrate the proposed methodology on Buchwald-Hartwig reaction optimization tasks, notably increasing the discovery rate of high-performing reactions. The method, compared to static LLM embeddings, improves performance significantly—showing a doubling from 24% to 43% in the coverage of top-performing reactions over just 50 optimization iterations. Specifically, this approach shines in diverse chemical benchmarks, consistently outperforming traditional fixed-feature and domain-specific baselines.

One of the key strengths of the method lies in its flexibility across different tasks and LLM architectures. It shows adaptability in terms of encoder, decoder, and encoder-decoder models and in various application domains whether chemistry-specific or not. A significant 14% improvement over traditional representations is achieved without resorting to specialized features—demonstrating the effectiveness of joint LLM-GP optimization.

The paper also provides insights into the essential factors of representation that enhance high-dimensional Bayesian Optimization. It establishes how alignment between GP inductive biases and the representation structure fosters successful optimization outcomes. The empirical evidence gathered from analyzing normalized smoothness metrics—specifically the correlation between GP's inductive bias and representation space—reinforces the findings, showing $\text{r} = 0.92$ as a correlation factor for optimization success.

The theoretical implications of this research are profound in understanding self-organized latent spaces. The approach inherently provides better-structured embedding spaces, uncertainty calibration, and more efficient sampling without external contrastive learning losses. This work redefines finetuning strategies, positioning GPs not only as uncertainty quantifiers but also as active partners in learning, guiding LLM adaptation effectively.

Practically, the research underscores significant advances in sample-efficient optimization, illustrating tangible benefits in domains such as chemical synthesis and molecular property prediction. The method's efficiency in reducing the experimental burden while covering high-performing regions emphasizes its potential in accelerating scientific discovery and industrial application.

Going forward, this work opens up possible avenues for further exploration in other high-impact fields demanding rigorous uncertainty quantification. Future developments might focus on elaborating adaptive LLM-GP frameworks across various modalities—beyond text, exploring areas like image processing and complex systems design requiring integration of massive datasets with inherently uncertain characteristics.

In summary, the paper presents a well-rounded, empirically robust perspective on the utility of LLMs within the Bayesian optimization paradigm. By effectively bridging LLM representation learning with the statistical rigor of GPs, this research sets a new standard for optimization-related tasks across diverse domains.

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (2)

Tweets

https://twitter.com/6ojaHa/status/1910024494397415914

https://twitter.com/GptMaestro/status/1916104685754478821

https://twitter.com/arxivsanitybot/status/1910161437382643891

https://twitter.com/PapersInML/status/1910030751946469697