Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fast non-autoregressive inverse folding with discrete diffusion (2312.02447v1)

Published 5 Dec 2023 in q-bio.BM and stat.ML

Abstract: Generating protein sequences that fold into a intended 3D structure is a fundamental step in de novo protein design. De facto methods utilize autoregressive generation, but this eschews higher order interactions that could be exploited to improve inference speed. We describe a non-autoregressive alternative that performs inference using a constant number of calls resulting in a 23 times speed up without a loss in performance on the CATH benchmark. Conditioned on the 3D structure, we fine-tune ProteinMPNN to perform discrete diffusion with a purity prior over the index sampling order. Our approach gives the flexibility in trading off inference speed and accuracy by modulating the diffusion speed. Code: https://github.com/johnyang101/pmpnndiff

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. Structured denoising diffusion models in discrete state-spaces. Advances in Neural Information Processing Systems, 34:17981–17993, 2021.
  2. Unleashing transformers: Parallel token prediction with discrete absorbing diffusion for fast high-resolution image generation from vector-quantized codes, 2021.
  3. Robust deep learning–based protein sequence design using proteinmpnn. Science, 378(6615):49–56, 2022. doi: 10.1126/science.add2187. URL https://www.science.org/doi/abs/10.1126/science.add2187.
  4. Pifold: Toward effective and efficient protein inverse folding. arXiv preprint arXiv:2209.12643, 2022.
  5. Sequence co-evolution gives 3d contacts and structures of protein complexes. elife, 3:e03430, 2014.
  6. Learning inverse folding from millions of predicted structures. In International Conference on Machine Learning, pages 8946–8970. PMLR, 2022.
  7. The coming of age of de novo protein design. Nature, 537(7620):320–327, 2016.
  8. Generative models for graph-based protein design. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper/2019/file/f3a4ff4839c56a5f460c88cce3666a2b-Paper.pdf.
  9. Learning from protein structure with geometric vector perceptrons, 2021.
  10. Evgeny Krissinel. On the relationship between sequence and structure similarities in proteomics. Bioinformatics, 23(6):717–723, 2007.
  11. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 379(6637):1123–1130, 2023. doi: 10.1126/science.ade2574. URL https://www.science.org/doi/abs/10.1126/science.ade2574.
  12. Cath–a hierarchic classification of protein domain structures. Structure, 5(8):1093–1109, 1997.
  13. Epistasis in protein evolution. Protein science, 25(7):1204–1218, 2016.
  14. Improved vector quantized diffusion models, 2023.
  15. De novo design of protein structure and function with rfdiffusion. Nature, pages 1–3, 2023.
  16. Hallucinating symmetric protein assemblies. Science, 378(6615):56–61, 2022.
  17. Protein structure generation via folding diffusion. 2022.
  18. Masked inverse folding with sequence transfer for protein representation learning. bioRxiv, 2023. doi: 10.1101/2022.05.25.493516. URL https://www.biorxiv.org/content/early/2023/03/19/2022.05.25.493516.
  19. Graph denoising diffusion for inverse protein folding. arXiv preprint arXiv:2306.16819, 2023.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com