Attribution of performance gains in ProtLM.TCR to HLA information versus data size
Ascertain whether the marginal (~2–4%) improvement in TCR–HLA class I epitope-binding predictions reported for the ProtLM.TCR model upon adding HLA information as a categorical variable is attributable to the inclusion of HLA features or confounded by the corresponding reduction in the total training data size.
Sponsor
References
Interestingly, the authors also provided additional HLA information as a categorical variable that marginally improved binding predictions (~2-4%), although it remains unclear if this increase is related to the corresponding decrease in total data size60.
— Learning immune receptor representations with protein language models
(2402.03823 - Dounas et al., 6 Feb 2024) in TCR-specific protein language models