Limits on Inferring T-cell Specificity from Partial Information (2404.12565v1)
Abstract: A key challenge in molecular biology is to decipher the mapping of protein sequence to function. To perform this mapping requires the identification of sequence features most informative about function. Here, we quantify the amount of information (in bits) that T-cell receptor (TCR) sequence features provide about antigen specificity. We identify informative features by their degree of conservation among antigen-specific receptors relative to null expectations. We find that TCR specificity synergistically depends on the hypervariable regions of both receptor chains, with a degree of synergy that strongly depends on the ligand. Using a coincidence-based approach to measuring information enables us to directly bound the accuracy with which TCR specificity can be predicted from partial matches to reference sequences. We anticipate that our statistical framework will be of use for developing machine learning models for TCR specificity prediction and for optimizing TCRs for cell therapies. The proposed coincidence-based information measures might find further applications in bounding the performance of pairwise classifiers in other fields.
- M. M. Davis and P. J. Bjorkman, Nature 334, 395 (1988).
- M. V. Pogorelyy and M. Shugay, Frontiers in Immunology 10, 1 (2019).
- Z. S. Ghoreyshi and J. T. George, Frontiers in Immunology 14 (2023).
- S. Laughlin, Zeitschrift für Naturforschung c 36, 910 (1981).
- M. Milighetti, Analysis of T cell receptor sequence and structure to understand the drivers of antigen specificity, Ph.D. thesis, UCL (University College London) (2023).
- C. T. Boughter and M. Meier-Schellersheim, PLOS Computational Biology 19, e1011577 (2023).
- A. Mayer and C. G. Callan Jr, Proceedings of the National Academy of Sciences 120, e2213264120 (2023).
- S. k. Ma, Journal of Statistical Physics 26, 221 (1981).
- I. Nemenman, Entropy 13, 2013 (2011).
- A. Tiffeau-Mayer, arXiv preprint arXiv:2310.03439 (2023).
- E. H. Simpson, Nature 163, 688 (1949).
- C. E. Shannon, The Bell system technical journal 27, 379 (1948).
- A. Rényi, in Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics, Vol. 4 (University of California Press, 1961) pp. 547–562.
- A. Y. Khinchin, Mathematical Foundations of Information Theory (Courier Corporation, 1957).
- L. L. Campbell, Information and control 8, 423 (1965).
- P. Jizba and T. Arimitsu, Annals of Physics 312, 17 (2004).
- V. M. Ilić and M. S. Stanković, Physica A: Statistical Mechanics and its Applications 411, 138 (2014).
- P. L. Williams and R. D. Beer, arXiv preprint arXiv:1004.2515 (2010).
- T. M. Cover and J. A. Thomas, Elements of Information Theory (Wiley, Hoboken, NJ, 2005).
- M. Ben-Bassat and J. Raviv, IEEE Transactions on Information Theory 24, 324 (1978).
- I. Csiszár, IEEE Transactions on information theory 41, 26 (1995).
- T. Wang and P. Isola, in International conference on machine learning (PMLR, 2020) pp. 9929–9939.
- Y. Nagano and B. Chain, Frontiers in Immunology 14, 1276106 (2023).