Protein structure generation via folding diffusion (2209.15611v2)

Published 30 Sep 2022 in q-bio.BM and cs.AI

Abstract: The ability to computationally generate novel yet physically foldable protein structures could lead to new biological discoveries and new treatments targeting yet incurable diseases. Despite recent advances in protein structure prediction, directly generating diverse, novel protein structures from neural networks remains difficult. In this work, we present a new diffusion-based generative model that designs protein backbone structures via a procedure that mirrors the native folding process. We describe protein backbone structure as a series of consecutive angles capturing the relative orientation of the constituent amino acid residues, and generate new structures by denoising from a random, unfolded state towards a stable folded structure. Not only does this mirror how proteins biologically twist into energetically favorable conformations, the inherent shift and rotational invariance of this representation crucially alleviates the need for complex equivariant networks. We train a denoising diffusion probabilistic model with a simple transformer backbone and demonstrate that our resulting model unconditionally generates highly realistic protein structures with complexity and structural patterns akin to those of naturally-occurring proteins. As a useful resource, we release the first open-source codebase and trained models for protein structure diffusion.

Authors (6)

Kevin E. Wu (1 paper)
Kevin K. Yang (11 papers)
Rianne van den Berg (22 papers)
James Y. Zou (7 papers)
Alex X. Lu (9 papers)
Ava P. Amini (3 papers)

Citations (153)

View on Semantic Scholar

Summary

Overview of "Protein structure generation via folding diffusion"

The paper "Protein structure generation via folding diffusion" by Kevin E. Wu et al. introduces a novel approach for generating physically plausible protein structures leveraging diffusion-based generative models. This work presents significant advancements in computational protein design, offering a methodology that mirrors the innate protein folding process, providing a pathway for creating novel protein structures.

Technical Contributions

Internal Angle Representation: The authors propose a method that models protein backbones using internal angles, specifically focusing on the inter-residue angles. This bypasses traditional Cartesian coordinates, thus simplifying the modeling process. By shifting the equivariance constraints to the coordinate system itself, the model alleviates complexities that typically require sophisticated equivariant networks.
Diffusion Model: The paper employs a denoising diffusion probabilistic model (DDPM) paired with a transformer architecture. The diffusion model iteratively refines noisy data back into a plausible low-energy conformation, akin to biological protein folding. This process allows for the direct generation of protein structures from scratch without post-processing through additional algorithms.
Training and Performance: The model's training leverages the CATH dataset, ensuring comprehensive protein structural diversity. The authors report the model's capability to generate protein backbones that not only replicate the natural distribution of inter-residue angles but also exhibit accurate structural motifs observable in naturally occurring proteins.

Numerical Results and Validation

The evaluation framework encompasses a comprehensive analysis of generated structures' quality, focusing on:

The distributions of generated angles closely aligning with natural datasets, demonstrating high fidelity in capturing protein folds' inherent complexity.
Ramachandran plots indicating that the generated structures possess realistic dihedral angle distributions, including right-handed helices and β-sheets, crucial for function and stability.
A significant proportion of generated structures were deemed designable based on self-consistency TM scores. This is gauged by the capacity of these structures to accommodate plausible amino acid sequences that fold back into the originally generated structure.

Moreover, compared to baselines and prior models, FoldingDiff demonstrates a superior capability to generate diverse protein structures containing typical secondary structural motifs, without collapsing into simpler or overrepresented forms like overly repetitive helices.

Implications and Future Directions

The implications of this research are multifaceted. Practically, this model can accelerate the design of new proteins with potential therapeutic applications, such as those targeting currently incurable diseases. Theoretically, it challenges the necessity of complex equivariant network architectures by presenting an efficient internal angle approach for equivariance handling.

Future avenues to explore could include scaling the model to handle multi-chain complexes, incorporating dynamic aspects of protein structures, and further enhancing the model's ability to generate functionally novel proteins. Additionally, integrating sequence generation directly within the structure generation process could significantly enhance the practical applicability of this approach for designing proteins with desired functionalities.

In summary, the work of Wu et al. provides a robust framework for protein structure generation, demonstrating an innovative application of diffusion models in computational biology. This methodology marks a promising step towards more efficient protein design and synthesis, which could revolutionize fields like drug development and synthetic biology.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/amelie_iska/status/1774087111643398607

https://twitter.com/BioSpace9/status/1757235647968878753

YouTube

Show All Videos