Fast non-autoregressive inverse folding with discrete diffusion (2312.02447v1)
Abstract: Generating protein sequences that fold into a intended 3D structure is a fundamental step in de novo protein design. De facto methods utilize autoregressive generation, but this eschews higher order interactions that could be exploited to improve inference speed. We describe a non-autoregressive alternative that performs inference using a constant number of calls resulting in a 23 times speed up without a loss in performance on the CATH benchmark. Conditioned on the 3D structure, we fine-tune ProteinMPNN to perform discrete diffusion with a purity prior over the index sampling order. Our approach gives the flexibility in trading off inference speed and accuracy by modulating the diffusion speed. Code: https://github.com/johnyang101/pmpnndiff
- Structured denoising diffusion models in discrete state-spaces. Advances in Neural Information Processing Systems, 34:17981–17993, 2021.
- Unleashing transformers: Parallel token prediction with discrete absorbing diffusion for fast high-resolution image generation from vector-quantized codes, 2021.
- Robust deep learning–based protein sequence design using proteinmpnn. Science, 378(6615):49–56, 2022. doi: 10.1126/science.add2187. URL https://www.science.org/doi/abs/10.1126/science.add2187.
- Pifold: Toward effective and efficient protein inverse folding. arXiv preprint arXiv:2209.12643, 2022.
- Sequence co-evolution gives 3d contacts and structures of protein complexes. elife, 3:e03430, 2014.
- Learning inverse folding from millions of predicted structures. In International Conference on Machine Learning, pages 8946–8970. PMLR, 2022.
- The coming of age of de novo protein design. Nature, 537(7620):320–327, 2016.
- Generative models for graph-based protein design. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper/2019/file/f3a4ff4839c56a5f460c88cce3666a2b-Paper.pdf.
- Learning from protein structure with geometric vector perceptrons, 2021.
- Evgeny Krissinel. On the relationship between sequence and structure similarities in proteomics. Bioinformatics, 23(6):717–723, 2007.
- Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 379(6637):1123–1130, 2023. doi: 10.1126/science.ade2574. URL https://www.science.org/doi/abs/10.1126/science.ade2574.
- Cath–a hierarchic classification of protein domain structures. Structure, 5(8):1093–1109, 1997.
- Epistasis in protein evolution. Protein science, 25(7):1204–1218, 2016.
- Improved vector quantized diffusion models, 2023.
- De novo design of protein structure and function with rfdiffusion. Nature, pages 1–3, 2023.
- Hallucinating symmetric protein assemblies. Science, 378(6615):56–61, 2022.
- Protein structure generation via folding diffusion. 2022.
- Masked inverse folding with sequence transfer for protein representation learning. bioRxiv, 2023. doi: 10.1101/2022.05.25.493516. URL https://www.biorxiv.org/content/early/2023/03/19/2022.05.25.493516.
- Graph denoising diffusion for inverse protein folding. arXiv preprint arXiv:2306.16819, 2023.