Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 67 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 21 tok/s Pro
GPT-5 High 16 tok/s Pro
GPT-4o 92 tok/s Pro
Kimi K2 191 tok/s Pro
GPT OSS 120B 461 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Retrieval Augmented Diffusion Model for Structure-informed Antibody Design and Optimization (2410.15040v1)

Published 19 Oct 2024 in cs.AI

Abstract: Antibodies are essential proteins responsible for immune responses in organisms, capable of specifically recognizing antigen molecules of pathogens. Recent advances in generative models have significantly enhanced rational antibody design. However, existing methods mainly create antibodies from scratch without template constraints, leading to model optimization challenges and unnatural sequences. To address these issues, we propose a retrieval-augmented diffusion framework, termed RADAb, for efficient antibody design. Our method leverages a set of structural homologous motifs that align with query structural constraints to guide the generative model in inversely optimizing antibodies according to desired design criteria. Specifically, we introduce a structure-informed retrieval mechanism that integrates these exemplar motifs with the input backbone through a novel dual-branch denoising module, utilizing both structural and evolutionary information. Additionally, we develop a conditional diffusion model that iteratively refines the optimization process by incorporating both global context and local evolutionary conditions. Our approach is agnostic to the choice of generative models. Empirical experiments demonstrate that our method achieves state-of-the-art performance in multiple antibody inverse folding and optimization tasks, offering a new perspective on biomolecular generative models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. Immunebuilder: Deep-learning models for predicting the structures of immune proteins. Communications Biology, 6(1):575, 2023.
  2. Rosettaantibodydesign (rabd): A general framework for computational antibody design. PLoS computational biology, 14(4):e1006112, 2018.
  3. Fragment-based computational design of antibodies targeting structured epitopes. Science Advances, 8(45):eabp9540, 2022.
  4. A compact vocabulary of paratope-epitope interactions enables predictability of antibody-antigen binding. Cell Reports, 34(11), 2021.
  5. Standard conformations for the canonical structures of immunoglobulins. Journal of molecular biology, 273(4):927–948, 1997.
  6. The rosetta all-atom energy function for macromolecular modeling and design. Journal of chemical theory and computation, 13(6):3031–3048, 2017.
  7. The protein data bank. Nucleic acids research, 28(1):235–242, 2000.
  8. Retrieval-augmented diffusion models. Advances in Neural Information Processing Systems, 35:15309–15324, 2022.
  9. Wiki-llava: Hierarchical retrieval-augmented generation for multimodal llms. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  1818–1826, 2024.
  10. Re-imagen: Retrieval-augmented text-to-image generator. In The Eleventh International Conference on Learning Representations, 2023.
  11. Canonical structures for the hypervariable regions of immunoglobulins. Journal of molecular biology, 196(4):901–917, 1987.
  12. Robust deep learning–based protein sequence design using proteinmpnn. Science, 378(6615):49–56, 2022.
  13. Inverse folding for antibody sequence design using deep learning. arXiv preprint arXiv:2310.19513, 2023.
  14. Anarci: antigen receptor numbering and receptor classification. Bioinformatics, 32(2):298–300, 2016.
  15. Sabdab: the structural antibody database. Nucleic acids research, 42(D1):D1140–D1146, 2014.
  16. Stability improvement of antibodies for extracellular and intracellular applications: Cdr grafting to stable frameworks and structure-based framework engineering. Methods, 34(2):184–199, 2004.
  17. Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997, 2023.
  18. Retrieval augmented language model pre-training. In Hal Daumé III and Aarti Singh (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp.  3929–3938. PMLR, 13–18 Jul 2020.
  19. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  20. Efficient evolution of human antibodies from general protein language models. Nature Biotechnology, 42(2):275–283, 2024.
  21. Axial attention in multidimensional transformers. arXiv preprint arXiv:1912.12180, 2019.
  22. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  23. Antifold: Improved antibody structure-based design using inverse folding. arXiv preprint arXiv:2405.03370, 2024.
  24. Argmax flows and multinomial diffusion: Learning categorical distributions. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  12454–12465. Curran Associates, Inc., 2021.
  25. Learning inverse folding from millions of predicted structures. In International conference on machine learning, pp.  8946–8970. PMLR, 2022.
  26. Reveal: Retrieval-augmented visual-language pre-training with multi-source multimodal knowledge memory. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  23369–23379, 2023.
  27. Interaction-based retrieval-augmented diffusion models for protein-specific 3d molecule generation. In Forty-first International Conference on Machine Learning, 2024.
  28. Iterative refinement graph neural network for antibody sequence-structure co-design. In International Conference on Learning Representations, 2021.
  29. Replacing the complementarity-determining regions in a human antibody with those from a mouse. Nature, 321(6069):522–525, 1986.
  30. Highly accurate protein structure prediction with alphafold. nature, 596(7873):583–589, 2021.
  31. Conditional antibody design as 3d equivariant graph translation. In The Eleventh International Conference on Learning Representations, 2022.
  32. End-to-end full-atom antibody design. In Proceedings of the 40th International Conference on Machine Learning, pp.  17409–17429, 2023.
  33. Improving antibody design with force-guided sampling in diffusion models, 2024.
  34. Abdesign: A n algorithm for combinatorial backbone design guided by natural conformations and sequences. Proteins: Structure, Function, and Bioinformatics, 83(8):1385–1406, 2015.
  35. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
  36. Geoab: Towards realistic antibody design and reliable affinity maturation. bioRxiv, pp.  2024–05, 2024.
  37. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 379(6637):1123–1130, 2023.
  38. Antibody complementarity determining region design using high-capacity machine learning. Bioinformatics, 36(7):2126–2133, 2020.
  39. Retrieval augmented classification for long-tail visual recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  6959–6969, 2022.
  40. Gangliosides, ab1 and ab2 antibodies: Ii. light versus heavy chain: An idiotype-anti-idiotype case study. Molecular Immunology, 44(5):1015–1028, 2007.
  41. Antigen-specific antibody design and optimization with diffusion-based generative models for protein structures. Advances in Neural Information Processing Systems, 35:9754–9767, 2022.
  42. Abdiffuser: full-atom generation of in-vitro functioning antibodies. Advances in Neural Information Processing Systems, 36, 2024.
  43. Ablang: an antibody language model for completing antibody sequences. Bioinformatics Advances, 2(1):vbac046, 2022.
  44. Leonard G Presta. Antibody engineering. Current Opinion in Structural Biology, 2(4):593–596, 1992.
  45. Retrieval-based knowledge augmented vision language pre-training. In Proceedings of the 31st ACM International Conference on Multimedia, pp.  5399–5409, 2023.
  46. Msa transformer. In International Conference on Machine Learning, pp.  8844–8856. PMLR, 2021.
  47. Five computational developability guidelines for therapeutic antibody profiling. Proceedings of the National Academy of Sciences, 116(10):4025–4030, 2019.
  48. Deciphering antibody affinity maturation with language models and weakly supervised learning. arXiv preprint arXiv:2112.07782, 2021.
  49. Antibody design using lstm based deep generative model from phage display library for affinity maturation. Scientific reports, 11(1):5852, 2021.
  50. Unsupervised evolution of protein and antibody complexes with a structure-informed language model. Science, 385(6704):46–53, 2024.
  51. knn-diffusion: Image generation via large-scale retrieval. In The Eleventh International Conference on Learning Representations, 2023.
  52. H3-rules: identification of cdr-h3 structures in antibodies. FEBS letters, 455(1-2):188–197, 1999.
  53. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pp.  2256–2265. PMLR, 2015.
  54. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
  55. Rational design of antibodies targeting specific epitopes within intrinsically disordered proteins. Proceedings of the National Academy of Sciences, 112(32):9902–9907, 2015.
  56. Guiding diffusion models for antibody sequence and structure co-design with developability properties. In NeurIPS 2023 Generative AI and Biology (GenBio) Workshop, 2023.
  57. On pre-training language model for antibody. In The eleventh international conference on learning representations, 2023a.
  58. Retrieval-based controllable molecule generation. In The Eleventh International Conference on Learning Representations, 2023b.
  59. A hierarchical training paradigm for antibody structure-sequence co-design. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (eds.), Advances in Neural Information Processing Systems, volume 36, pp.  31140–31157. Curran Associates, Inc., 2023.
  60. Diversity in the cdr3 region of vh is sufficient for most antibody specificities. Immunity, 13(1):37–45, 2000.
  61. Retrieval meets long context large language models. In The Twelfth International Conference on Learning Representations, 2024.
  62. Making retrieval-augmented language models robust to irrelevant context. In The Twelfth International Conference on Learning Representations, 2024.
  63. Remodiffuse: Retrieval-augmented motion diffusion model. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  364–373, 2023a.
  64. Retrieve anything to augment large language models. arXiv preprint arXiv:2310.07554, 2023b.
  65. Structure-informed language models are protein designers. In International conference on machine learning, pp.  42317–42338. PMLR, 2023.
  66. Rapid search for tertiary fragments reveals protein sequence–structure relationships. Protein Science, 24(4):508–524, 2015.
  67. Antigen-specific antibody design via direct energy-based preference optimization. In ICML 2024 AI for Science Workshop, 2024.
  68. Antibody design using a score-based diffusion model guided by evolutionary, physical and geometric constraints. In Forty-first International Conference on Machine Learning, 2024.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 post and received 0 likes.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube