Impact of receptor–antigen pretraining on structural and biophysical predictions

Determine how pre-training protein language models on combined receptor and antigen sequences affects structural reconstruction accuracy and prediction of biophysical properties—including specificity, affinity, and neutralization—relative to models trained on antibodies alone.

Background

The paper contrasts models capable of modeling antibodies alongside antigens (e.g., AlphaFold-Multimer, ESM-Fold) with IgFold, which currently focuses on antibody sequences only. While these tools show promise, the authors state that the benefits of pre-training on receptor–antigen data for improving structural and biophysical predictions are not yet known.

Clarifying this effect would inform how to design training corpora and model objectives for tasks where antibody–antigen interactions are central.

References

It therefore remains unknown how future iterations of protein-LLMs pre-trained using receptors and antigens can improve the structural reconstruction and prediction of biophysical features such as specificity, affinity, and neutralization.

Learning immune receptor representations with protein language models (2402.03823 - Dounas et al., 6 Feb 2024) in Further applications of PLMs in adaptive immunity