Protein Design with Guided Discrete Diffusion (2305.20009v2)

Published 31 May 2023 in cs.LG and q-bio.BM

Abstract: A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling. The generative model samples plausible sequences while the discriminative model guides a search for sequences with high fitness. Given its broad success in conditional sampling, classifier-guided diffusion modeling is a promising foundation for protein design, leading many to develop guided diffusion models for structure with inverse folding to recover sequences. In this work, we propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models that follows gradients in the hidden states of the denoising network. NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods, including scarce data and challenging inverse design. Moreover, we use NOS to generalize LaMBO, a Bayesian optimization procedure for sequence design that facilitates multiple objectives and edit-based constraints. The resulting method, LaMBO-2, enables discrete diffusions and stronger performance with limited edits through a novel application of saliency maps. We apply LaMBO-2 to a real-world protein design task, optimizing antibodies for higher expression yield and binding affinity to several therapeutic targets under locality and developability constraints, attaining a 99% expression rate and 40% binding rate in exploratory in vitro experiments.

References (83)

Citations (76)

View on Semantic Scholar

Summary

The paper introduces DiffusioN Optimized Sampling (NOS) to guide discrete diffusion models for efficient protein sequence optimization.
The paper extends LaMBO to LaMBO-2 by integrating saliency maps to pinpoint optimal sequence edits for multi-objective design.
The paper demonstrates impressive results with a 99% expression rate and a 40% binding rate in antibody sequence optimization.

An Analysis of "Protein Design with Guided Discrete Diffusion"

The paper "Protein Design with Guided Discrete Diffusion" explores a novel approach to protein design by leveraging discrete diffusion models. The authors propose a framework called diffusioN Optimized Sampling (NOS) to guide sampling in discrete diffusion models directly in sequence space. This approach contrasts with structure-based protein design methods that often depend on structural data and inverse folding, which can be limited by data availability and computational expense.

Key Contributions

DiffusioN Optimized Sampling (NOS): The authors introduce NOS, a method that uses hidden states of denoising networks for gradient-guided sampling in discrete diffusion models. This allows for efficient navigation through the vast protein sequence space to locate sequences that not only exhibit high fitness but can also be sampled quickly during inference.
LaMBO-2: An extension of the Bayesian optimization framework LaMBO, LaMBO-2 integrates NOS to handle multiple objectives in sequence design. A notable addition is the use of saliency maps to identify optimal edit positions, ensuring better resource allocation in sequence modification.
Practical Application and Results: The authors demonstrate the potential of LaMBO-2 through the optimization of antibody sequences for improved expression yields and binding affinities. The ability to enhance binding affinity while maintaining high naturalness of sequences was evidenced, with LaMBO-2 producing a 99% expression rate and a 40% binding rate in experimental settings.

Numerical Strengths and Results

The experimental results presented in the paper significantly underscore the efficacy of the NOS and LaMBO-2 frameworks. The authors showcase impressive success rates: a 99% expression rate and a 40% binding rate in vitro. These metrics highlight the framework's capability to generate functional sequences with high likelihood of success. Moreover, the guided sampling and optimization strategies demonstrated consistent improvements over genetic algorithms and unguided search methods, in both single and multiple objective settings.

Theoretical and Practical Implications

The introduction of NOS and its application in LaMBO-2 presents a notable shift in protein design paradigms. By focusing on sequence space rather than structure space, the authors tackle fundamental challenges in protein optimization such as data scarcity and computational limitations associated with structural models.

From a theoretical perspective, this work pushes the boundaries of sequence-based protein design by providing a mechanism to incorporate gradient-based guidance effectively. This could spur further research into improved methodologies for discrete optimization problems found in other domains of computational biology and engineering.

Practically, the potential applications in drug design, specifically antibody engineering, are profound. The ability to efficiently generate high-fitness antibody libraries in silico without extensive high-throughput screening marks a pivotal advancement. This could significantly reduce the time and cost associated with the development of therapeutic antibodies.

Future Developments

The methods developed in this paper enhance our capabilities in protein design, but there remain several avenues for future work. Extending these approaches to handle more complex objectives, further optimizing the computational efficiency of the framework, and applying these models to other types of biomolecules, such as nucleic acids, are promising directions.

Additionally, exploring hybrid models that integrate both sequence and structure information more efficiently could capitalize on the strengths of NOS and existing structural methods. Finally, given the rapid advancements in computational hardware and algorithms for deep learning, leveraging increased model complexity and dataset sizes could further boost the effectiveness of this approach.

In summary, the paper presents a comprehensive strategy for protein sequence optimization using guided discrete diffusion models, offering significant insights and results that advance the field of computational protein design.

PDF Markdown

GitHub

GitHub - ngruver/NOS: Protein Design with Guided Discrete Diffusion (132 stars)

YouTube

Show All Videos