Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AntiFold: Improved antibody structure-based design using inverse folding (2405.03370v1)

Published 6 May 2024 in q-bio.BM and q-bio.QM

Abstract: The design and optimization of antibodies requires an intricate balance across multiple properties. Protein inverse folding models, capable of generating diverse sequences folding into the same structure, are promising tools for maintaining structural integrity during antibody design. Here, we present AntiFold, an antibody-specific inverse folding model, fine-tuned from ESM-IF1 on solved and predicted antibody structures. AntiFold outperforms existing inverse folding tools on sequence recovery across complementarity-determining regions, with designed sequences showing high structural similarity to their solved counterpart. It additionally achieves stronger correlations when predicting antibody-antigen binding affinity in a zero-shot manner, while performance is augmented further when including antigen information. AntiFold assigns low probabilities to mutations that disrupt antigen binding, synergizing with protein LLM residue probabilities, and demonstrates promise for guiding antibody optimization while retaining structure-related properties. AntiFold is freely available under the BSD 3-Clause as a web server at https://opig.stats.ox.ac.uk/webapps/antifold/ and and pip installable package at https://github.com/oxpig/AntiFold

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. ImmuneBuilder: Deep-Learning models for predicting the structures of immune proteins. Communications Biology, 6(1):1–8, 2023. ISSN 23993642. 10.1038/s42003-023-04927-7.
  2. Protein sequence design with a learned potential. Nature Communications, 13(1):1–11, 2022. ISSN 20411723. 10.1038/s41467-022-28313-9.
  3. Robust deep learning–based protein sequence design using proteinmpnn. Science, 378(6615):49–56, 2022. 10.1126/science.add2187.
  4. Inverse folding for antibody sequence design using deep learning. The 2023 ICML Workshop on Computational Biology, 2023.
  5. SAbDab: The structural antibody database. Nucleic Acids Research, 42(D1):1140–1146, 2014. 10.1093/nar/gkt1043.
  6. Efficient evolution of human antibodies from general protein language models. Nature Biotechnology, Apr 2023. ISSN 1546-1696. 10.1038/s41587-023-01763-2.
  7. Learning inverse folding from millions of predicted structures. bioRxiv, 2022. 10.1101/2022.04.10.487779.
  8. Advances in computational structure-based antibody design. Current Opinion in Structural Biology, 74:102379, 2022. ISSN 0959-440X. https://doi.org/10.1016/j.sbi.2022.102379.
  9. Generative models for graph-based protein design. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  10. Learning from protein structure with geometric vector perceptrons, 2021.
  11. Observed Antibody Space: A Resource for Data Mining Next-Generation Sequencing of Antibody Repertoires. The Journal of Immunology, 201(8):2502–2509, 10 2018. ISSN 0022-1767. 10.4049/jimmunol.1800708.
  12. Imgt unique numbering for immunoglobulin and t cell receptor variable domains and ig superfamily v-like domains. Developmental & Comparative Immunology, 27(1):55–77, 2003. https://doi.org/10.1016/S0145-305X(02)00039-3.
  13. Development of therapeutic antibodies for the treatment of diseases. Journal of Biomedical Science, 27(1):1–30, 2020. ISSN 14230127. 10.1186/s12929-019-0592-z.
  14. Co-optimization of therapeutic antibody affinity and specificity using machine learning models that generalize to novel mutational space. Nature Communications, 13(1), 2022. ISSN 20411723. 10.1038/s41467-022-31457-3.
  15. Optimization of therapeutic antibodies for reduced self-association and non-specific binding via interpretable machine learning. Nature Biomedical Engineering, 2023. 10.1038/s41551-023-01074-6.
  16. Humanization of antibodies using a machine learning approach on large-scale repertoire data. Bioinformatics, 37(22):4041–4047, 2021. ISSN 1367-4803. 10.1093/bioinformatics/btab434.
  17. Colabfold: making protein folding accessible to all. Nature Methods, 19(6):679–682, 2022. ISSN 1548-7105. 10.1038/s41592-022-01488-1. URL https://doi.org/10.1038/s41592-022-01488-1.
  18. Observed antibody space: A diverse database of cleaned, annotated, and translated unpaired and paired antibody sequences. Protein Science, 2022a. https://doi.org/10.1002/pro.4205.
  19. Observed antibody space: A diverse database of cleaned, annotated, and translated unpaired and paired antibody sequences. Protein Science, 31(1):141–146, 2022b. https://doi.org/10.1002/pro.4205.
  20. C. Outeiral and C. M. Deane. Perfecting antibodies with language models. Nature Biotechnology, 42(2):185–186, 2024. ISSN 1546-1696. 10.1038/s41587-023-01991-6.
  21. Biophi: A platform for antibody design, humanization, and humanness evaluation. mAbs, 14(1):2020203, 2022. 10.1080/19420862.2021.2020203.
  22. Understanding and overcoming trade-offs between antibody affinity, specificity, stability and solubility. Biochemical Engineering Journal, 137:365–374, 2018. ISSN 1369-703X. https://doi.org/10.1016/j.bej.2018.06.003.
  23. The h3 loop of antibodies shows unique structural characteristics. Proteins: Structure, Function, and Bioinformatics, 85(7):1311–1318, 2017. https://doi.org/10.1002/prot.25291.
  24. SAbDab in the age of biotherapeutics: updates including SAbDab-nano, the nanobody structure tracker. Nucleic Acids Research, 50(D1):D1368–D1372, 11 2021. 10.1093/nar/gkab1050.
  25. Schrödinger, LLC. The PyMOL molecular graphics system, version. November 2015.
  26. In vitro validated antibody design against multiple therapeutic antigens using generative inverse folding. bioRxiv, 2023. 10.1101/2023.12.08.570889.
  27. Inverse folding of protein complexes with a structure-informed language model enables unsupervised antibody evolution. bioRxiv, 2023. 10.1101/2023.12.19.572475.
  28. Antibody structure. Microbiology Spectrum, 2(2):10.1128/microbiolspec.aid–0012–2013, 2014. 10.1128/microbiolspec.aid-0012-2013.
  29. Fast and flexible protein design using deep graph neural networks. Cell Systems, 11(4):402–411.e4, 2020. ISSN 2405-4712. https://doi.org/10.1016/j.cels.2020.08.016.
  30. How to fine-tune BERT for text classification? CoRR, abs/1905.05583, 2019. URL http://arxiv.org/abs/1905.05583.
  31. Computational optimization of antibody humanness and stability. Nature Biomedical Engineering, 2023. 10.1038/s41551-023-01079-1.
  32. Attention is all you need. 2023.
  33. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17:261–272, 2020. 10.1038/s41592-019-0686-2.
  34. Optimizing antibody affinity and stability by the automated design of the variable light-heavy chain interfaces. PLOS Computational Biology, 15(8):1–24, 08 2019. 10.1371/journal.pcbi.1007207.
  35. Structural basis for the neutralization and specificity of staphylococcal enterotoxin b against its mhc class ii binding site. mAbs, 6(1):119–129, 2014. 10.4161/mabs.27106. PMID: 24423621.
Citations (3)

Summary

  • The paper introduces AntiFold, a specialized inverse folding model that significantly improves antibody sequence recovery, particularly in variable CDRH3 regions.
  • It achieves outstanding performance with up to 84% recovery in CDR sequences and a mean RMSD of 0.95, ensuring high structural fidelity.
  • The model also predicts binding affinities effectively and is accessible as a web server and pip-installable package for seamless integration into design workflows.

AntiFold: A Specialized Inverse Folding Model for Antibody Design

Overview

The paper introduces AntiFold, a computational tool aimed at enhancing antibody design through the application of inverse folding models, specifically optimized for antibodies. AntiFold builds upon the foundational architecture of ESM-IF1, a protein inverse folding model, and is tailored to outperform existing tools in sequence recovery and binding affinity prediction, particularly in the complementarity-determining regions (CDRs) of antibodies.

Key Contributions

AntiFold presents several notable advancements in the field of protein design, particularly for antibodies, which are crucial therapeutic agents. The model focuses on the following areas:

  1. Inverse Folding Application: AntiFold is fine-tuned from ESM-IF1 using both experimentally solved and predicted antibody structures, focusing on retaining structural integrity critical for antibodies, such as stability and antigen-binding properties.
  2. Performance Metrics: The model significantly improves the recovery of native sequences, especially in CDRH3 regions, which are known for their variability and role in antigen binding. It achieves a 60% amino acid recovery in CDRH3, which is superior to previous models like AbMPNN and ESM-IF1.
  3. Structural Fidelity: Designed sequences generated by AntiFold exhibit high structural fidelity when the sequences are refolded, as demonstrated by lower RMSD values compared to existing methods.
  4. Binding Affinity Prediction: AntiFold effectively predicts antibody-antigen binding affinities using inverse folding probabilities. It shows enhanced performance when antigen information is included, which is crucial for accurate binding site interactions.
  5. Integration and Accessibility: AntiFold is developed as a readily accessible tool available as a web server and a pip-installable package, facilitating its integration into antibody design workflows.

Numerical Results

  • The model improves CDR sequence recovery rates to ranges of 75-84% across different CDR regions.
  • For backbone structural fidelity, AntiFold achieves a mean RMSD of 0.95 for refolded CDR loop structures, indicating high structural similarity to the native conformations.
  • In a benchmark with deep mutational scanning data, AntiFold achieves a Spearman's correlation of 0.418 for binding affinity predictions, outperforming other models significantly.

Implications and Future Prospects

The implications of AntiFold are both practical and theoretical. Practically, AntiFold aids in the efficient design of antibody therapeutics by optimizing for desired traits while preserving structural characteristics essential for function. Theoretically, the model's success underscores the potential of specialized inverse folding models in protein design tasks. As machine learning techniques evolve, future developments may involve integrating more complex features, such as full antibody-antigen complex structures, or employing more advanced neural network architectures to further improve predictive accuracy. Additionally, the synergy of AntiFold with protein LLMs suggests a promising pathway for combining sequence-based and structure-based deep learning approaches.

Conclusion

AntiFold represents a significant step forward in the application of artificial intelligence to protein engineering, particularly in the niche of antibody design. By leveraging a focused inverse folding approach, the model successfully addresses key challenges in antibody sequence optimization and binding affinity prediction. With continued refinement and integration into broader antibody development pipelines, AntiFold and similar specialized models could greatly enhance the toolkit available for biologics design and development.