Aligning Transformers with Continuous Feedback via Energy Rank Alignment (2405.12961v2)

Published 21 May 2024 in cs.LG, cs.AI, physics.chem-ph, and q-bio.QM

Abstract: Searching through chemical space is an exceptionally challenging problem because the number of possible molecules grows combinatorially with the number of atoms. Large, autoregressive models trained on databases of chemical compounds have yielded powerful generators, but we still lack robust strategies for generating molecules with desired properties. This molecular search problem closely resembles the "alignment" problem for LLMs, though for many chemical tasks we have a specific and easily evaluable reward function. Here, we introduce an algorithm called energy rank alignment (ERA) that leverages an explicit reward function to produce a gradient-based objective that we use to optimize autoregressive policies. We show theoretically that this algorithm is closely related to proximal policy optimization (PPO) and direct preference optimization (DPO), but has a minimizer that converges to an ideal Gibbs-Boltzmann distribution with the reward playing the role of an energy function. Furthermore, this algorithm is highly scalable, does not require reinforcement learning, and performs well relative to DPO when the number of preference observations per pairing is small. We deploy this approach to align molecular transformers and protein LLMs to generate molecules and protein sequences, respectively, with externally specified properties and find that it does so robustly, searching through diverse parts of chemical space.

Citations (3)

View on Semantic Scholar

Summary

The paper introduces Energy Rank Alignment, a gradient-based optimization algorithm for molecule discovery and LLM alignment without extensive human feedback.
It effectively shifts chemical property distributions in both single and multi-property optimization while preserving molecular diversity.
The algorithm enhances LLM outputs by reducing unsafe content by over 90% and aligning sentiments more effectively.

Exploring Energy Rank Alignment (ERA): A New Algorithm for Molecular Optimization and LLM Alignment

Let's dive into a fascinating piece of research that introduces Energy Rank Alignment (ERA), a novel algorithm designed to help with the optimization of molecules and the alignment of LLMs. This paper's focus is split between improving chemical searches and refining AI-generated responses in LLMs, both of which have some pretty promising implications.

Background

Searching through the vast chemical space to find molecules with specific properties is a huge challenge. Imagine trying to find a needle in a haystack, but the haystack grows exponentially with each added atom. Large autoregressive models trained on chemical databases have made meaningful strides in molecule generation, but achieving desired properties remains difficult.

This is reminiscent of the "alignment" problem in LLMs—ensuring models produce outputs that align with specific preferences or constraints. Traditionally, solutions involve labor-intensive processes like reinforcement learning from human feedback (RLHF) or direct preference optimization (DPO). These methods either require vast amounts of human-curated data or are computationally expensive.

The ERA Algorithm

ERA introduces a gradient-based approach leveraging an explicit reward function to optimize sampling policies. This is particularly useful because, unlike RL-based methods, ERA doesn't demand extensive human feedback. Here's a snapshot of what ERA brings to the table:

Gradient-Based Optimization: Directly optimizes the sampling policy using gradients, making it more straightforward than traditional RL-based algorithms.
Robustness and Scalability: Performs well with limited preference data, which is crucial for practical applications.
Versatility: Can be applied to both chemical molecule generation and LLM alignment tasks.

Numerical Results

The ERA algorithm was tested on both molecule generation and LLM alignment tasks, demonstrating impressive results.

Molecular Generation

Using ERA, the researchers were able to guide a transformer model to generate molecules with desired properties. Here's a breakdown of what was achieved:

Single-Property Optimization: ERA was employed to optimize properties like molar refractivity (MR), ring count, and logP. For each property, they could effectively shift the distribution to target preferred values.
Multi-Property Optimization: ERA also handled multi-property optimization scenarios by adjusting weights for different properties in the reward function. This enabled the generation of molecules balancing multiple desired characteristics.
Sample Diversity: Despite strong optimization towards desired properties, the generated molecules maintained high diversity, crucial for discovering novel compounds.

A few figures from the paper highlight these findings:

Fig. 1: Shows different chemical property distributions for molecules sampled from both aligned and unaligned policies.
Fig. 2: Demonstrates the balance of multi-property optimization, like high QED and logP.

LLM Alignment

While focusing on an AI supervised task for LLM alignment, ERA proved its versatility:

Sentiment Alignment: When fine-tuning a GPT-2 model on movies review data, ERA was able to skew the output's sentiment towards more positive tones effectively.
Safety and Coherence: In a challenging task of generating safe chat responses, ERA was used to align a larger LLaMA model using a weaker LLaMA model's responses as guidance. The aligned model managed to reduce unsafe content generation by over 90%.

Another figure of interest:

Fig. 3: Illustrates the average sentiment improvement and the proportion of unsafe content across different model alignments and β values.

Implications

The implications of ERA are quite exciting for both theoretical and practical applications:

Chemical Discovery: ERA provides a scalable method to navigate the complex chemical space efficiently. This could accelerate the discovery of new drugs or materials.
AI Alignment: Provides a more accessible approach to aligning LLMs, potentially reducing the need for extensive human feedback.

Looking Ahead

Though ERA has shown promising results, there are many intriguing avenues for future research:

Further Optimization: While ERA strikes a good balance, exploring its integration with other optimization frameworks could yield even better results.
Broader Applications: Extending ERA to other fields like image generation and beyond could open up new possibilities.

In summary, Energy Rank Alignment (ERA) offers a fresh and scalable approach to a couple of pretty tough nuts to crack: optimizing chemical molecules and aligning LLMs. Whether you're looking for next-gen materials or more coherent AI conversations, ERA might just be a step in the right direction.

PDF Markdown

Related Papers

Tweets

https://twitter.com/gklambauer/status/1793151624434909603

https://twitter.com/GrantRotskoff/status/1793170856409260156

https://twitter.com/shriramc1/status/1793322877607149787

https://twitter.com/fly51fly/status/1793755116509835568

https://twitter.com/GrantRotskoff/status/1795625992037544297

https://twitter.com/XTXI/status/1923346262813790571