Diffusion probabilistic modeling of protein backbones in 3D for the motif-scaffolding problem (2206.04119v2)

Published 8 Jun 2022 in q-bio.BM, cs.LG, and stat.ML

Abstract: Construction of a scaffold structure that supports a desired motif, conferring protein function, shows promise for the design of vaccines and enzymes. But a general solution to this motif-scaffolding problem remains open. Current machine-learning techniques for scaffold design are either limited to unrealistically small scaffolds (up to length 20) or struggle to produce multiple diverse scaffolds. We propose to learn a distribution over diverse and longer protein backbone structures via an E(3)-equivariant graph neural network. We develop SMCDiff to efficiently sample scaffolds from this distribution conditioned on a given motif; our algorithm is the first to theoretically guarantee conditional samples from a diffusion model in the large-compute limit. We evaluate our designed backbones by how well they align with AlphaFold2-predicted structures. We show that our method can (1) sample scaffolds up to 80 residues and (2) achieve structurally diverse scaffolds for a fixed motif.

Citations (190)

View on Semantic Scholar

Summary

The paper introduces SMCDiff and ProtDiff, diffusion models that efficiently sample protein scaffolds conditioned on specific functional motifs.
ProtDiff leverages an E(3)-equivariant graph neural network to generate scaffold structures up to 80 residues, outperforming traditional techniques.
The study validates scaffold quality using AlphaFold2 predictions, establishing a new benchmark in structural diversity for protein design.

Diffusion Probabilistic Modeling of Protein Backbones in 3D for the Motif-Scaffolding Problem

The paper addresses a crucial challenge in computational protein design: constructing scaffold structures that stabilize functional motifs, essential for designing vaccines and enzymes. While motif-scaffolding has shown potential in previous studies, traditional approaches demand extensive expert intervention and are constrained by either the minimal size of scaffold structures or inefficiency in generating diverse alternatives. This work introduces SMCDiff and ProtDiff, a diffusion probabilistic model approach, which represents a significant advancement in efficiently sampling higher-length and structurally varied scaffold backbones.

ProtDiff applies an E(3)-equivariant graph neural network to learn distributions over protein backbone structures, with an ability to sample scaffolds conditioned on a given motif using SMCDiff—an algorithm capable of theoretically guaranteeing conditional sampling accuracy in compute-intensive scenarios. SMCDiff uniquely combines sequential Monte Carlo methods with diffusion models and offers advantages over existing techniques by reducing approximation errors in conditional sampling.

Quantitative results indicate ProtDiff's capacity to sample reliable protein scaffolds of up to 80 residues, which surpasses the scope of current machine-learning techniques traditionally limited to 20 residues. Moreover, the structural diversity achieved for fixed motifs marks a substantial improvement, showcasing the method's potential for practical applications in protein design, spanning both novel protein structure generation and motif incorporation tasks. The self-consistency evaluations leveraging AlphaFold2-predicted structures provide validation of scaffold quality, establishing a new benchmark in scaffold design efficiency.

Despite promising outcomes, challenges remain, such as generalizing beyond the motif instances present in training data and handling chirality constraints. Moreover, while ProtDiff overcomes the limitations in scaffold size and diversity, extending its application beyond monomeric proteins to complex, multi-domain scenarios represents an area ripe for further development.

Future research could focus on refining modeling techniques to improve sequence prediction and domain interaction prediction capabilities. Additionally, pursuit of comprehensive benchmarks for motif-scaffolding problems and the exploration of models integrating side-chain information alongside backbone structures would be beneficial. As the foundation for efficient motif-scaffolding design, ProtDiff and SMCDiff set the stage for advances in computational protein engineering, aiding the synthesis of functional proteins tailored for therapeutic and industrial applications.

PDF Markdown

Related Papers

Tweets

https://twitter.com/canaesseth/status/1930142108616405340

https://twitter.com/AdrienCorenflos/status/1840005784484061471