Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generating Novel, Designable, and Diverse Protein Structures by Equivariantly Diffusing Oriented Residue Clouds (2301.12485v3)

Published 29 Jan 2023 in q-bio.BM and cs.LG

Abstract: Proteins power a vast array of functional processes in living cells. The capability to create new proteins with designed structures and functions would thus enable the engineering of cellular behavior and development of protein-based therapeutics and materials. Structure-based protein design aims to find structures that are designable (can be realized by a protein sequence), novel (have dissimilar geometry from natural proteins), and diverse (span a wide range of geometries). While advances in protein structure prediction have made it possible to predict structures of novel protein sequences, the combinatorially large space of sequences and structures limits the practicality of search-based methods. Generative models provide a compelling alternative, by implicitly learning the low-dimensional structure of complex data distributions. Here, we leverage recent advances in denoising diffusion probabilistic models and equivariant neural networks to develop Genie, a generative model of protein structures that performs discrete-time diffusion using a cloud of oriented reference frames in 3D space. Through in silico evaluations, we demonstrate that Genie generates protein backbones that are more designable, novel, and diverse than existing models. This indicates that Genie is capturing key aspects of the distribution of protein structure space and facilitates protein design with high success rates. Code for generating new proteins and training new versions of Genie is available at https://github.com/aqlaboratory/genie.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Yeqing Lin (7 papers)
  2. Mohammed AlQuraishi (9 papers)
Citations (61)

Summary

An Overview of Generating Novel Protein Structures with Genie

The paper "Generating Novel, Designable, and Diverse Protein Structures by Equivariantly Diffusing Oriented Residue Clouds" presents a method for de novo protein design, focusing on capturing protein structure distributions to create novel and designable protein configurations. This approach is grounded in denoising diffusion probabilistic models (DDPMs) paired with SE(3)-equivariant neural networks, forming a system named Genie. The key innovation lies in generating protein backbones through diffusion in 3D Cartesian space, leveraging both traditional positional encodings and new geometric representations.

Methods and Approach

Genie operates using a denoising diffusion probabilistic model that iteratively refines a protein's configuration from an initial state of Gaussian noise to a coherent structure. This process hinges on the accurate modeling of atomic interactions and configurations requisite for protein stability and function. The forward process progressively applies noise to protein backbone coordinates, while the reverse process reconstructs sensible structures using an SE(3)-equivariant denoiser.

Importantly, Genie models proteins using dual representations: a point cloud represents the protein in the forward process, while a reference frame cloud is utilized during noise reduction. Such duality allows for an efficient training regime without departing from Gaussian assumptions of DDPMs, leading to high fidelity in resulting configurations.

Key Results

The evaluation demonstrates Genie's superiority over other models like ProtDiff and FoldingDiff in terms of designability, diversity, and novelty. A majority (81.5%) of Genie's generated structures demonstrated excellent designability with scores exceeding 0.5 in the self-consistency Template Modeling (scTM) metric. In contrast, only 5.1% and 19.6% of ProtDiff and FoldingDiff structures, respectively, reached similar thresholds.

Diversely, the generated structures span a wide range of secondary structure elements, displaying a rich array of alpha-helical and beta-strand compositions. In terms of novel configuration coverage, Genie achieves a significant proportion of unique protein folds, notably 21.5% of structures having no close analog in the training dataset.

Implications and Future Directions

The implications of this work are profound for both theoretical development and practical applications in protein design. Theoretically, Genie marks significant progress in modeling protein structures with high geometric and configurational fidelity, aiding in exploration beyond naturally occurring protein domains. Practically, such a model enhances the toolkit available for engineering proteins with targeted functions, vital for medicinal chemistry and material science.

Future research could focus on scaling Genie's architecture, integrating sequence co-design capabilities, and experimenting with conditional generative approaches for functional proteins. These areas could enable more precise design and application of proteins in diverse domains, such as enzyme engineering and therapeutic developments.

This paper contributes an innovative approach to protein structure generation through diffusion models, demonstrating significant progress in desired qualities like designability, diversity, and novelty, even when juxtaposed against competing methods. The proposed methodology, notably Genie, presents a promising leap toward optimizing protein structures for various scientific applications.

Github Logo Streamline Icon: https://streamlinehq.com