Best pre-training strategy for protein language models in TCR specificity prediction

Determine the most effective pre-training strategy for protein language models to enable accurate prediction of αβ T cell receptor–peptide–MHC (pMHC) specificity, specifying the training objective(s) and procedures that yield superior transfer performance for TCR specificity prediction.

Background

Predicting which αβ T cell receptors (TCRs) bind which peptide–MHC (pMHC) ligands is a central challenge in immunology, yet specificity-labeled TCR data are limited. In other domains, unsupervised pre-training of transformer-based protein LLMs (PLMs) has enabled effective transfer learning to downstream tasks.

Despite this precedent, the appropriate pre-training approach for PLMs tailored to TCR specificity prediction is not established. The paper highlights this uncertainty and proposes SCEPTR, a PLM jointly trained with autocontrastive learning and masked-language modeling, as a step toward addressing the gap, leaving the broader question of the optimal pre-training strategy unresolved.

References

However, it is unclear how to best pre-train protein LLMs for TCR specificity prediction.

Contrastive learning of T cell receptor representations (2406.06397 - Nagano et al., 10 Jun 2024) in Abstract