De novo protein design using geometric vector field networks (2310.11802v1)
Abstract: Innovations like protein diffusion have enabled significant progress in de novo protein design, which is a vital topic in life science. These methods typically depend on protein structure encoders to model residue backbone frames, where atoms do not exist. Most prior encoders rely on atom-wise features, such as angles and distances between atoms, which are not available in this context. Thus far, only several simple encoders, such as IPA, have been proposed for this scenario, exposing the frame modeling as a bottleneck. In this work, we proffer the Vector Field Network (VFN), which enables network layers to perform learnable vector computations between coordinates of frame-anchored virtual atoms, thus achieving a higher capability for modeling frames. The vector computation operates in a manner similar to a linear layer, with each input channel receiving 3D virtual atom coordinates instead of scalar values. The multiple feature vectors output by the vector computation are then used to update the residue representations and virtual atom coordinates via attention aggregation. Remarkably, VFN also excels in modeling both frames and atoms, as the real atoms can be treated as the virtual atoms for modeling, positioning VFN as a potential universal encoder. In protein diffusion (frame modeling), VFN exhibits an impressive performance advantage over IPA, excelling in terms of both designability (67.04% vs. 53.58%) and diversity (66.54% vs. 51.98%). In inverse folding (frame and atom modeling), VFN outperforms the previous SoTA model, PiFold (54.7% vs. 51.66%), on sequence recovery rate. We also propose a method of equipping VFN with the ESM model, which significantly surpasses the previous ESM-based SoTA (62.67% vs. 55.65%), LM-Design, by a substantial margin.
- De novo protein design by deep network hallucination. Nature, 600(7889):547–552, 2021.
- Accurate prediction of protein structures and interactions using a three-track neural network. Science, 373(6557):871–876, 2021.
- Graphqa: protein model quality assessment using graph convolutional networks. Bioinformatics, 37(3):360–366, 2021.
- The Protein Data Bank. Nucleic Acids Research, 28(1):235–242, 01 2000. ISSN 0305-1048.
- Protein data bank (pdb): the single global macromolecular structure archive. Protein Crystallography: Methods and Protocols, pp. 627–641, 2017.
- Fold2seq: A joint sequence (1d)-fold (3d) embedding-based generative model for protein design. In International Conference on Machine Learning, pp. 1261–1271. PMLR, 2021.
- Robust deep learning–based protein sequence design using proteinmpnn. Science, 378(6615):49–56, 2022.
- Deep convolutional networks for quality assessment of protein folds. Bioinformatics, 34(23):4046–4053, 2018.
- Petribert: Augmenting bert with tridimensional encoding for inverse protein folding and design. BioRxiv, pp. 2022–08, 2022.
- A latent diffusion model for protein structure generation, 2023.
- SE(3)-transformers: 3d roto-translation equivariant attention networks. Advances in Neural Information Processing Systems, 33:1970–1981, 2020.
- PiFold: Toward effective and efficient protein inverse folding. International Conference on Learning Representations, 2022a.
- Alphadesign: A graph protein design method and benchmark on alphafolddb. arXiv preprint arXiv:2202.01079, 2022b.
- Diffsds: A language diffusion model for protein backbone inpainting under geometric conditions and constraints, 2023a.
- Knowledge-design: Pushing the limit of protein deign via knowledge refinement. arXiv preprint arXiv:2305.15151, 2023b.
- Maxcluster: a tool for protein structure comparison and clustering, 2008.
- Contrastive representation learning for 3d protein structures. arXiv preprint arXiv:2205.15675, 2022.
- Intrinsic-extrinsic convolution and pooling for learning on 3d protein structures. In International Conference on Learning Representations, 2020.
- A high-level programming language for generative protein design. bioRxiv, 2022.
- Deepsf: deep convolutional neural network for mapping protein sequences to folds. Bioinformatics, 34(8):1295–1303, 2018.
- Learning inverse folding from millions of predicted structures. In International Conference on Machine Learning, pp. 8946–8970. PMLR, 2022.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- A backbone-centred energy function of neural networks for protein design. Nature, 602(7897):523–528, 2022.
- The coming of age of de novo protein design. Nature, 537(7620):320–327, 2016.
- Generative models for graph-based protein design. Advances in Neural Information Processing Systems, 32, 2019.
- Alphadesign: A de novo protein design framework based on AlphaFold. BioRxiv, pp. 2021–10, 2021.
- Antibody-antigen docking and design via hierarchical equivariant refinement. arXiv preprint arXiv:2207.06616, 2022.
- Learning from protein structure with geometric vector perceptrons. International Conference on Learning Representations, 2020.
- Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
- De novo protein design for novel folds using guided conditional wasserstein generative adversarial networks. Journal of Chemical Information and Modeling, 60(12):5667–5681, 2020.
- Directed weight neural networks for protein structure representation learning. arXiv preprint arXiv:2201.13299, 2022.
- Direct prediction of profiles of sequences compatible with a protein structure by neural networks with fragment-based local and energy-based nonlocal profiles. Proteins: Structure, Function, and Bioinformatics, 82(10):2565–2573, 2014.
- Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 2023.
- Yi Liu and Brian Kuhlman. Rosettadesign server for protein design. Nucleic acids research, 34(suppl_2):W235–W238, 2006.
- A deep SE(3)-equivariant model for learning inverse protein folding. BioRxiv, pp. 2022–04, 2022.
- Cath–a hierarchic classification of protein domain structures. Structure, 5(8):1093–1109, 1997.
- Structure-based protein design with deep learning. Current Opinion in Structural Biology, 65:136–144, 2021.
- A structure-based deep learning framework for protein engineering. BioRxiv, 2019.
- Generative de novo protein design with global context. arXiv preprint arXiv:2204.10673, 2022.
- Atom3d: Tasks on molecules in three dimensions. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1), 2021.
- Fast and accurate protein structure search with foldseek. bioRxiv, 2023.
- Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
- Learning hierarchical protein representations via complete 3d graph networks. In The Eleventh International Conference on Learning Representations, 2022a.
- Comenet: Towards complete and efficient message passing for 3d molecular graphs. Advances in Neural Information Processing Systems, 35:650–664, 2022b.
- De novo design of protein structure and function with rfdiffusion. Nature, pp. 1–3, 2023.
- Protein sequence design with deep generative models. Current Opinion in Structural Biology, 65:18–27, 2021.
- Se (3) diffusion model with application to protein backbone generation. arXiv preprint arXiv:2302.02277, 2023.
- Prodconn: Protein design using a convolutional neural network. Proteins: Structure, Function, and Bioinformatics, 88(7):819–829, 2020.
- Protein representation learning by geometric structure pretraining. In The Eleventh International Conference on Learning Representations, 2022.
- Structure-informed language models are protein designers. In International Conference on Machine Learning, 2023.
- Uni-mol: A universal 3d molecular representation learning framework. 2023.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.