Ophiuchus: Scalable Modeling of Protein Structures through Hierarchical Coarse-graining SO(3)-Equivariant Autoencoders (2310.02508v2)
Abstract: Three-dimensional native states of natural proteins display recurring and hierarchical patterns. Yet, traditional graph-based modeling of protein structures is often limited to operate within a single fine-grained resolution, and lacks hourglass neural architectures to learn those high-level building blocks. We narrow this gap by introducing Ophiuchus, an SO(3)-equivariant coarse-graining model that efficiently operates on all-atom protein structures. Our model departs from current approaches that employ graph modeling, instead focusing on local convolutional coarsening to model sequence-motif interactions with efficient time complexity in protein length. We measure the reconstruction capabilities of Ophiuchus across different compression rates, and compare it to existing models. We examine the learned latent space and demonstrate its utility through conformational interpolation. Finally, we leverage denoising diffusion probabilistic models (DDPM) in the latent space to efficiently sample protein structures. Our experiments demonstrate Ophiuchus to be a scalable basis for efficient protein modeling and generation.
- Protein structure and sequence generation with equivariant denoising diffusion probabilistic models, 2022.
- Accurate prediction of protein structures and interactions using a three-track neural network. Science, 373(6557):871–876, August 2021a. doi: 10.1126/science.abj8754. URL https://doi.org/10.1126/science.abj8754.
- Accurate prediction of protein structures and interactions using a three-track neural network. Science, 373(6557):871–876, 2021b.
- MACE: Higher order equivariant message passing neural networks for fast and accurate force fields. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=YPpSngE-ZU.
- The protein data bank. Nucleic acids research, 28(1):235–242, 2000.
- Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR, abs/2104.13478, 2021. URL https://arxiv.org/abs/2104.13478.
- Prediction of local structure in proteins using a library of sequence-structure motifs. Journal of molecular biology, 281(3):565–577, 1998.
- Design of protein-binding proteins from the target structure alone. Nature, 605(7910):551–560, 2022.
- Vector neurons: A general framework for so(3)-equivariant networks, 2021.
- Ig-VAE: Generative modeling of protein structure by direct 3d coordinate generation. PLOS Computational Biology, August 2020. doi: 10.1101/2020.08.07.242347. URL https://doi.org/10.1101/2020.08.07.242347.
- Ankh: Optimized protein language model unlocks general-purpose modelling, 2023.
- Continuous-discrete convolution for geometry-sequence modeling in proteins. In The Eleventh International Conference on Learning Representations, 2022.
- A latent diffusion model for protein structure generation, 2023.
- Se(3)-transformers: 3d roto-translation equivariant attention networks, 2020.
- e3nn: Euclidean neural networks, 2022.
- Haiku: Sonnet for JAX, 2020. URL http://github.com/deepmind/dm-haiku.
- Denoising diffusion probabilistic models, 2020.
- Deepsf: deep convolutional neural network for mapping protein sequences to folds. Bioinformatics, 34(8):1295–1303, 2018.
- Pdbflex: exploring flexibility in protein structures. Nucleic acids research, 44(D1):D423–D428, 2016.
- Peter J Huber. Robust estimation of a location parameter. In Breakthroughs in statistics: Methodology and distribution, pp. 492–518. Springer, 1992.
- Illuminating protein space with a programmable generative model. biorxiv, December 2022. doi: 10.1101/2022.12.01.518682. URL https://doi.org/10.1101/2022.12.01.518682.
- Learning from protein structure with geometric vector perceptrons, 2021.
- Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
- Thrasyvoulos Karydis. Learning hierarchical motif embeddings for protein engineering. PhD thesis, Massachusetts Institute of Technology, 2017.
- Coarse-grained protein models and their applications. Chemical reviews, 116(14):7898–7936, 2016.
- Deepconv-dti: Prediction of drug-target interactions via deep learning with convolution on protein sequences. PLoS computational biology, 15(6):e1007129, 2019.
- Terminator: A neural framework for structure-based protein design using tertiary repeating motifs, 2022.
- Generating novel, designable, and diverse protein structures by equivariantly diffusing oriented residue clouds, 2023.
- Deep generative models create new and diverse protein structures. In Machine Learning for Structural Biology Workshop, NeurIPS, 2021.
- Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 379(6637):1123–1130, 2023.
- Euclidean transformers for macromolecular structures: Lessons learned. 2022 ICML Workshop on Computational Biology, 2022.
- Protein structural motifs in prediction and design. Current opinion in structural biology, 44:161–167, 2017.
- Tertiary alphabet for the observable protein structural universe. Proceedings of the National Academy of Sciences, 113(47), November 2016. doi: 10.1073/pnas.1607178113. URL https://doi.org/10.1073/pnas.1607178113.
- Protein ensemble generation through variational autoencoder latent space sampling. bioRxiv, pp. 2023–08, 2023.
- Relevance of rotationally equivariant convolutions for predicting molecular properties. CoRR, abs/2008.08461, 2020. URL https://arxiv.org/abs/2008.08461.
- The building blocks of interpretability. Distill, 2018. doi: 10.23915/distill.00010. https://distill.pub/2018/building-blocks.
- Deep learning protein conformational space with convolutions and latent interpolations. Physical Review X, 11(1):011052, 2021.
- Progressive distillation for fast sampling of diffusion models, 2022.
- E(n) equivariant graph neural networks, 2022.
- Tess E Smidt. Euclidean symmetry and equivariance in machine learning. Trends in Chemistry, 3(2):82–85, 2021.
- Deep unsupervised learning using nonequilibrium thermodynamics, 2015.
- Tertiary motifs as building blocks for the design of protein-binding peptides. Protein Science, 31(6):e4322, 2022.
- Tensor field networks: Rotation- and translation-equivariant neural networks for 3d point clouds, 2018.
- Atom3d: Tasks on molecules in three dimensions, 2022.
- Diffusion probabilistic modeling of protein backbones in 3d for the motif-scaffolding problem, 2023.
- Modularity of protein folds as a tool for template-free modeling of structures. PLoS computational biology, 11(8):e1004419, 2015.
- Coarse-graining auto-encoders for molecular dynamics. npj Computational Materials, 5(1):125, 2019.
- Generative coarse-graining of molecular conformations, 2022.
- Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models. bioRxiv, 2022. doi: 10.1101/2022.12.09.519842. URL https://www.biorxiv.org/content/early/2022/12/10/2022.12.09.519842.
- Christoph Wehmeyer and Frank Noé . Time-lagged autoencoders: Deep learning of slow collective variables for molecular kinetics. The Journal of Chemical Physics, 148(24), mar 2018. doi: 10.1063/1.5011399. URL https://doi.org/10.1063%2F1.5011399.
- 3d steerable cnns: Learning rotationally equivariant features in volumetric data, 2018.
- A compact review of molecular property prediction with graph neural networks. Drug Discovery Today: Technologies, 37:1–12, 2020.
- Auto-encoding molecular conformations, 2021.
- High-resolution de novo structure prediction from primary sequence. BioRxiv, pp. 2022–07, 2022.
- Convolutions are competitive with transformers for protein sequence pretraining. bioRxiv, pp. 2022–05, 2022.
- Chemically transferable generative backmapping of coarse-grained proteins, 2023.
- Se(3) diffusion model with application to protein backbone generation, 2023.
- Graph neural networks and their current applications in bioinformatics. Frontiers in genetics, 12:690049, 2021.
- Protein representation learning by geometric structure pretraining. arXiv preprint arXiv:2203.06125, 2022.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.