Full-Atom Peptide Design with Geometric Latent Diffusion (2402.13555v4)
Abstract: Peptide design plays a pivotal role in therapeutics, allowing brand new possibility to leverage target binding sites that are previously undruggable. Most existing methods are either inefficient or only concerned with the target-agnostic design of 1D sequences. In this paper, we propose a generative model for full-atom \textbf{Pep}tide design with \textbf{G}eometric \textbf{LA}tent \textbf{D}iffusion (PepGLAD) given the binding site. We first establish a benchmark consisting of both 1D sequences and 3D structures from Protein Data Bank (PDB) and literature for systematic evaluation. We then identify two major challenges of leveraging current diffusion-based models for peptide design: the full-atom geometry and the variable binding geometry. To tackle the first challenge, PepGLAD derives a variational autoencoder that first encodes full-atom residues of variable size into fixed-dimensional latent representations, and then decodes back to the residue space after conducting the diffusion process in the latent space. For the second issue, PepGLAD explores a receptor-specific affine transformation to convert the 3D coordinates into a shared standard space, enabling better generalization ability across different binding shapes. Experimental Results show that our method not only improves diversity and binding affinity significantly in the task of sequence-structure co-design, but also excels at recovering reference structures for binding conformation generation.
- Rosettaantibodydesign (rabd): A general framework for computational antibody design. PLoS computational biology, 14(4):e1006112, 2018.
- The rosetta all-atom energy function for macromolecular modeling and design. Journal of chemical theory and computation, 13(6):3031–3048, 2017.
- Protein structure and sequence generation with equivariant denoising diffusion probabilistic models. arXiv preprint arXiv:2205.15019, 2022.
- Dockq: a quality measure for protein-protein docking models. PloS one, 11(8):e0161879, 2016.
- The protein data bank. Nucleic acids research, 28(1):235–242, 2000.
- Accurate de novo design of hyperstable constrained peptides. Nature, 538(7625):329–335, 2016.
- Evobind: in silico directed evolution of peptide binders with alphafold. bioRxiv, pp. 2022–07, 2022.
- Design of protein-binding proteins from the target structure alone. Nature, 605(7910):551–560, 2022.
- Principles of protein–protein recognition. Nature, 256(5520):705–708, 1975.
- Biopython: freely available python tools for computational molecular biology and bioinformatics. Bioinformatics, 25(11):1422, 2009.
- Cramér, H. Mathematical methods of statistics, volume 26. Princeton university press, 1999.
- Accelerated antimicrobial discovery via deep generative models and molecular dynamics simulations. Nature Biomedical Engineering, 5(6):613–623, 2021.
- Robust deep learning–based protein sequence design using proteinmpnn. Science, 378(6615):49–56, 2022.
- Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
- A hitchhiker’s guide to geometric gnns for 3d atomic systems. arXiv preprint arXiv:2312.07511, 2023.
- Peptide therapeutics: current status and future directions. Drug discovery today, 20(1):122–128, 2015.
- Matrix computations. JHU press, 2013.
- The x-pro peptide bond as an nmr probe for conformational studies of flexible linear peptides. Biopolymers: Original Research on Biomolecules, 15(10):2025–2041, 1976.
- Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence. Protein Engineering, Design and Selection, 4(2):155–161, 1990.
- Geometrically equivariant graph neural networks: A survey. arXiv preprint arXiv:2202.07230, 2022.
- Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences, 89(22):10915–10919, 1992.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Anchor extension: a structure-guided approach to design cyclic peptides targeting enzyme active sites. Nature Communications, 12(1):3384, 2021.
- Generative models for graph-based protein design. Advances in neural information processing systems, 32, 2019.
- Illuminating protein space with a programmable generative model. Nature, pp. 1–9, 2023.
- Iterative refinement graph neural network for antibody sequence-structure co-design. arXiv preprint arXiv:2110.04624, 2021.
- Antibody-antigen docking and design via hierarchical equivariant refinement. arXiv preprint arXiv:2207.06616, 2022.
- Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- Conditional antibody design as 3d equivariant graph translation. arXiv preprint arXiv:2208.06073, 2022.
- End-to-end full-atom antibody design. arXiv preprint arXiv:2302.00203, 2023.
- A comprehensive review on current advances in peptide drug development and design. International journal of molecular sciences, 20(10):2383, 2019.
- A deep-learning framework for multi-level peptide–protein interaction prediction. Nature communications, 12(1):5465, 2021.
- Macromolecular modeling and design in rosetta: recent methods and frameworks. Nature methods, 17(7):665–680, 2020.
- The structural basis of peptide-protein binding strategies. Structure, 18(2):188–199, 2010.
- Rosetta flexpepdock web server—high resolution modeling of peptide–protein interactions. Nucleic acids research, 39(suppl_2):W249–W253, 2011.
- Antigen-specific antibody design and optimization with diffusion-based generative models for protein structures. Advances in Neural Information Processing Systems, 35:9754–9767, 2022.
- Abdiffuser: Full-atom generation of in-vitro functioning antibodies. arXiv preprint arXiv:2308.05027, 2023.
- Mitternacht, S. Freesasa: An open source c library for solvent accessible surface area calculations. F1000Research, 5, 2016.
- Recurrent neural network model for constructive peptide design. Journal of chemical information and modeling, 58(2):472–479, 2018.
- A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of molecular biology, 48(3):443–453, 1970.
- Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pp. 8162–8171. PMLR, 2021.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10684–10695, 2022.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pp. 2256–2265. PMLR, 2015.
- Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
- Mmseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature biotechnology, 35(11):1026–1028, 2017.
- Tertiary motifs as building blocks for the design of protein-binding peptides. Protein Science, 31(6):e4322, 2022.
- Diffusion probabilistic modeling of protein backbones in 3d for the motif-scaffolding problem. arXiv preprint arXiv:2206.04119, 2022.
- Harnessing protein folding neural networks for peptide–protein docking. Nature communications, 13(1):176, 2022.
- Computational design of peptide ligands. Trends in biotechnology, 29(5):231–239, 2011.
- Abode: Ab initio antibody design using conjoined odes. arXiv preprint arXiv:2306.01005, 2023.
- Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of machine learning research, 11(12), 2010.
- Accelerating antimicrobial peptide discovery with latent sequence-structure model. arXiv preprint arXiv:2212.09450, 2022.
- De novo design of protein structure and function with rfdiffusion. Nature, 620(7976):1089–1100, 2023.
- Pepbdb: a comprehensive structural database of biological peptide–protein interactions. Bioinformatics, 35(1):175–177, 2019.
- Comprehensive evaluation of fourteen docking programs on protein–peptide complexes. Journal of chemical theory and computation, 16(6):3959–3969, 2020.
- Protein structure generation via folding diffusion. 2022.
- Computational prediction of mhc anchor locations guides neoantigen identification and prioritization. Science immunology, 8(82):eabg2200, 2023.
- Helixgan a deep-learning methodology for conditional de novo design of α𝛼\alphaitalic_α-helix structures. Bioinformatics, 39(1):btad036, 2023.
- Helixdiff: Hotspot-specific full-atom design of peptides using diffusion models.
- Geodiff: A geometric diffusion model for molecular conformation generation. arXiv preprint arXiv:2203.02923, 2022.
- Geometric latent diffusion models for 3d molecule generation. In International Conference on Machine Learning, pp. 38592–38610. PMLR, 2023.
- Se (3) diffusion model with application to protein backbone generation. arXiv preprint arXiv:2302.02277, 2023.
- Diffpack: A torsional diffusion model for autoregressive protein side-chain packing. arXiv preprint arXiv:2306.01794, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.