MATE-Pred: Multimodal Attention-based TCR-Epitope interaction Predictor (2401.08619v1)
Abstract: An accurate binding affinity prediction between T-cell receptors and epitopes contributes decisively to develop successful immunotherapy strategies. Some state-of-the-art computational methods implement deep learning techniques by integrating evolutionary features to convert the amino acid residues of cell receptors and epitope sequences into numerical values, while some other methods employ pre-trained LLMs to summarize the embedding vectors at the amino acid residue level to obtain sequence-wise representations. Here, we propose a highly reliable novel method, MATE-Pred, that performs multi-modal attention-based prediction of T-cell receptors and epitopes binding affinity. The MATE-Pred is compared and benchmarked with other deep learning models that leverage multi-modal representations of T-cell receptors and epitopes. In the proposed method, the textual representation of proteins is embedded with a pre-trained bi-directional encoder model and combined with two additional modalities: a) a comprehensive set of selected physicochemical properties; b) predicted contact maps that estimate the 3D distances between amino acid residues in the sequences. The MATE-Pred demonstrates the potential of multi-modal model in achieving state-of-the-art performance (+8.4\% MCC, +5.5\% AUC compared to baselines) and efficiently capturing contextual, physicochemical, and structural information from amino acid residues. The performance of MATE-Pred projects its potential application in various drug discovery regimes.
- M. Krogsgaard and M. M. Davis, “How t cells’ see’antigen,” Nature immunology, vol. 6, no. 3, pp. 239–245, 2005.
- D. Hudson, R. A. Fernandes, M. Basham, G. Ogg, and H. Koohy, “Can we predict t cell specificity with digital biology and machine learning?” Nature Reviews Immunology, pp. 1–11, 2023.
- M. M. Davis and P. J. Bjorkman, “T-cell antigen receptor genes and t-cell recognition,” Nature, vol. 334, no. 6181, pp. 395–402, 1988.
- A. T. Nguyen, C. Szeto, and S. Gras, “The pockets guide to hla class i molecules,” Biochemical Society Transactions, vol. 49, no. 5, pp. 2319–2331, 2021.
- J. L. Xu and M. M. Davis, “Diversity in the cdr3 region of vh is sufficient for most antibody specificities,” Immunity, vol. 13, no. 1, pp. 37–45, 2000.
- M. C. Raman, P. J. Rizkallah, R. Simmons, Z. Donnellan, J. Dukes, G. Bossi, G. S. Le Provost, P. Todorov, E. Baston, E. Hickman et al., “Direct molecular mimicry enables off-target cardiovascular toxicity by an enhanced affinity tcr designed for cancer immunotherapy,” Scientific reports, vol. 6, no. 1, p. 18851, 2016.
- P. D. Sun, C. E. Foster, and J. C. Boyington, “Overview of protein structural and functional folds,” Current protocols in protein science, vol. 35, no. 1, pp. 17–1, 2004.
- C. Graham, R. Hewitson, A. Pagliuca, and R. Benjamin, “Cancer immunotherapy with car-t cells–behold the future,” Clinical Medicine, vol. 18, no. 4, p. 324, 2018.
- L. Zhao and Y. J. Cao, “Engineered t cell therapy for cancer in the clinic,” Frontiers in immunology, vol. 10, p. 2250, 2019.
- S. Gielis, P. Moris, W. Bittremieux, N. De Neuter, B. Ogunjimi, K. Laukens, and P. Meysman, “Detection of enriched t cell epitope specificity in full t cell receptor sequence repertoires,” Frontiers in immunology, vol. 10, p. 2820, 2019.
- E. Jokinen, J. Huuhtanen, S. Mustjoki, M. Heinonen, and H. Lähdesmäki, “Predicting recognition between t cell receptors and epitopes with tcrgp,” PLoS computational biology, vol. 17, no. 3, p. e1008814, 2021.
- V. I. Jurtz, L. E. Jessen, A. K. Bentzen, M. C. Jespersen, S. Mahajan, R. Vita, K. K. Jensen, P. Marcatili, S. R. Hadrup, B. Peters et al., “Nettcr: sequence-based prediction of tcr binding to peptide-mhc complexes using convolutional neural networks,” BioRxiv, p. 433706, 2018.
- A. Montemurro, V. Schuster, H. R. Povlsen, A. K. Bentzen, V. Jurtz, W. D. Chronister, A. Crinklaw, S. R. Hadrup, O. Winther, B. Peters et al., “Nettcr-2.0 enables accurate prediction of tcr-peptide binding by using paired tcrα𝛼\alphaitalic_α and β𝛽\betaitalic_β sequence data,” Communications biology, vol. 4, no. 1, p. 1060, 2021.
- I. Springer, H. Besser, N. Tickotsky-Moskovitz, S. Dvorkin, and Y. Louzoun, “Prediction of specific tcr-peptide binding from large dictionaries of tcr-peptide pairs,” Frontiers in immunology, p. 1803, 2020.
- A. Weber, J. Born, and M. Rodriguez Martínez, “Titan: T-cell receptor specificity prediction with bimodal attention networks,” Bioinformatics, vol. 37, no. Supplement_1, pp. i237–i244, 2021.
- M. Cai, S. Bang, P. Zhang, and H. Lee, “Atm-tcr: Tcr-epitope binding affinity prediction using a multi-head self-attention model,” Frontiers in Immunology, vol. 13, 2022.
- K. Wu, K. E. Yost, B. Daniel, J. A. Belk, Y. Xia, T. Egawa, A. Satpathy, H. Y. Chang, and J. Zou, “Tcr-bert: learning the grammar of t-cell receptors for flexible antigen-xbinding analyses,” bioRxiv, pp. 2021–11, 2021.
- Y. LeCun, Y. Bengio et al., “Convolutional networks for images, speech, and time series,” The handbook of brain theory and neural networks, vol. 3361, no. 10, p. 1995, 1995.
- P. Moris, J. De Pauw, A. Postovskaya, S. Gielis, N. De Neuter, W. Bittremieux, B. Ogunjimi, K. Laukens, and P. Meysman, “Current challenges for unseen-epitope tcr interaction prediction and a new perspective derived from image classification,” Briefings in Bioinformatics, vol. 22, no. 4, p. bbaa318, 2021.
- S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
- S. Henikoff and J. G. Henikoff, “Amino acid substitution matrices from protein blocks.” Proceedings of the National Academy of Sciences, vol. 89, no. 22, pp. 10 915–10 919, 1992.
- J.-W. Sidhom, H. B. Larman, D. M. Pardoll, and A. S. Baras, “Deeptcr is a deep learning framework for revealing sequence concepts within t-cell repertoires,” Nature communications, vol. 12, no. 1, p. 1605, 2021.
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
- M. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, “Deep contextualized word representations. arxiv 2018,” arXiv preprint arXiv:1802.05365, vol. 12, 2018.
- P. Zhang, S. Bang, and H. Lee, “Pite: Tcr-epitope binding affinity prediction pipeline using transformer-based sequence encoder,” in PACIFIC SYMPOSIUM ON BIOCOMPUTING 2023: Kohala Coast, Hawaii, USA, 3–7 January 2023. World Scientific, 2022, pp. 347–358.
- A. M. Luu, J. R. Leistico, T. Miller, S. Kim, and J. S. Song, “Predicting tcr-epitope binding specificity using deep metric learning and multimodal learning,” Genes, vol. 12, no. 4, p. 572, 2021.
- M. Shugay, D. V. Bagaev, I. V. Zvyagin, R. M. Vroomans, J. C. Crawford, G. Dolton, E. A. Komech, A. L. Sycheva, A. E. Koneva, E. S. Egorov et al., “Vdjdb: a curated database of t-cell receptor sequences with known antigen specificity,” Nucleic acids research, vol. 46, no. D1, pp. D419–D427, 2018.
- R. Vita, S. Mahajan, J. A. Overton, S. K. Dhanda, S. Martini, J. R. Cantrell, D. K. Wheeler, A. Sette, and B. Peters, “The immune epitope database (iedb): 2018 update,” Nucleic acids research, vol. 47, no. D1, pp. D339–D343, 2019.
- N. Tickotsky, T. Sagiv, J. Prilusky, E. Shifrut, and N. Friedman, “Mcpas-tcr: a manually curated catalogue of pathology-associated t cell receptor sequences,” Bioinformatics, vol. 33, no. 18, pp. 2924–2929, 2017.
- S. Nolan, M. Vignali, M. Klinger, J. N. Dines, I. M. Kaplan, E. Svejnoha, T. Craft, K. Boland, M. Pesesky, R. M. Gittelman et al., “A large-scale database of t-cell receptor beta (tcrβ𝛽\betaitalic_β) sequences and binding associations from natural and synthetic exposure to sars-cov-2,” Research square, 2020.
- P. Zhang, S. Bang, M. Cai, and H. Lee, “Context-aware amino acid embedding advances analysis of tcr-epitope interactions,” bioRxiv, pp. 2023–04, 2023.
- M. Larralde, “peptides.py v.0.3.1,” https://github.com/althonos/peptides.py, 2022.
- S. Wang, S. Sun, Z. Li, R. Zhang, and J. Xu, “Accurate de novo prediction of protein contact map by ultra-deep learning model,” PLoS computational biology, vol. 13, no. 1, p. e1005324, 2017.
- Z. Lin, H. Akin, R. Rao, B. Hie, Z. Zhu, W. Lu, N. Smetanin, R. Verkuil, O. Kabeli, Y. Shmueli et al., “Evolutionary-scale prediction of atomic-level protein structure with a language model,” Science, vol. 379, no. 6637, pp. 1123–1130, 2023.
- P. Xu, X. Zhu, and D. A. Clifton, “Multimodal learning with transformers: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- D. Chicco and G. Jurman, “The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation,” BMC genomics, vol. 21, pp. 1–13, 2020.
- Y. Jiang, M. Huo, and S. Cheng Li, “Teinet: a deep learning framework for prediction of tcr–epitope binding specificity,” Briefings in Bioinformatics, vol. 24, no. 2, p. bbad086, 2023.
- T. Lu, Z. Zhang, J. Zhu, Y. Wang, P. Jiang, X. Xiao, C. Bernatchez, J. V. Heymach, D. L. Gibbons, J. Wang et al., “Deep learning-based prediction of the t cell receptor–antigen binding specificity,” Nature machine intelligence, vol. 3, no. 10, pp. 864–875, 2021.
- Y. Jiang and S. C. Li, “Deep autoregressive generative models capture the intrinsics embedded in t-cell receptor repertoires,” Briefings in Bioinformatics, vol. 24, no. 2, p. bbad038, 2023.
- R. O. Emerson, W. S. DeWitt, M. Vignali, J. Gravley, J. K. Hu, E. J. Osborne, C. Desmarais, M. Klinger, C. S. Carlson, J. A. Hansen et al., “Immunosequencing identifies signatures of cytomegalovirus exposure history and hla-mediated effects on the t cell repertoire,” Nature genetics, vol. 49, no. 5, pp. 659–665, 2017.