Effects of immune receptor corpus design on learned representations and performance
Determine the effects of pre-training corpus design choices—including general proteins versus exclusively adaptive immune receptor sequences, full-length receptor sequences versus CDR3-only segments, inclusion of receptor–antigen interactions, and inclusion of single versus multiple species or individuals—on learned representations and downstream performance of protein language models applied to adaptive immune receptor tasks.
Sponsor
References
Many open questions remain regarding how the nature of the immune receptor corpus influences learned representations and model performance. For example, factors such as whether pre-training should be performed on general proteins or exclusively on immune receptors, full-length sequences versus CDR3s, receptor-antigen interactions are included, or multiple species or even individuals will influence downstream conclusions and predictions.