Extracting Protein-Protein Interactions (PPIs) from Biomedical Literature using Attention-based Relational Context Information (2403.05602v1)
Abstract: Because protein-protein interactions (PPIs) are crucial to understand living systems, harvesting these data is essential to probe disease development and discern gene/protein functions and biological processes. Some curated datasets contain PPI data derived from the literature and other sources (e.g., IntAct, BioGrid, DIP, and HPRD). However, they are far from exhaustive, and their maintenance is a labor-intensive process. On the other hand, machine learning methods to automate PPI knowledge extraction from the scientific literature have been limited by a shortage of appropriate annotated data. This work presents a unified, multi-source PPI corpora with vetted interaction definitions augmented by binary interaction type labels and a Transformer-based deep learning method that exploits entities' relational context information for relation representation to improve relation classification performance. The model's performance is evaluated on four widely studied biomedical relation extraction datasets, as well as this work's target PPI datasets, to observe the effectiveness of the representation to relation extraction tasks in various data. Results show the model outperforms prior state-of-the-art models. The code and data are available at: https://github.com/BNLNLP/PPI-Relation-Extraction
- A. Brückner, C. Polge, N. Lentze, N. Auerbach, and U. Schlattner, “Yeast two-hybrid, a powerful tool for systems biology,” International Journal of Molecular Sciences, no. 10, pp. 2763–2788, 2009.
- W. Dunham, M. Mullin, and A. Gingras, “Affinity-purification coupled to mass spectrometry: basic principles and strategies,” Proteomics, vol. 12, no. 10, pp. 1576–90, 2012.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, 2017, pp. 5998–6008.
- R. Islamaj Doğan, S. Kim, A. Chatr-Aryamontri, C.-H. Wei, D. C. Comeau, R. Antunes, S. Matos, Q. Chen, A. Elangovan, N. C. Panyam et al., “Overview of the biocreative vi precision medicine track: mining protein interactions and mutations for precision medicine,” Database, vol. 2019, 2019.
- R. Bunescu, R. Ge, R. J. Kate, E. M. Marcotte, R. J. Mooney, A. K. Ramani, and Y. W. Wong, “Comparative experiments on learning information extractors for proteins and their interactions,” Artificial intelligence in medicine, vol. 33, no. 2, pp. 139–155, 2005.
- S. Pyysalo, F. Ginter, J. Heimonen, J. Björne, J. Boberg, J. Järvinen, and T. Salakoski, “Bioinfer: a corpus for information extraction in the biomedical domain,” BMC bioinformatics, vol. 8, no. 1, pp. 1–24, 2007.
- K. Fundel, R. Küffner, and R. Zimmer, “Relex—relation extraction using dependency parse trees,” Bioinformatics, vol. 23, no. 3, pp. 365–371, 2007.
- J. Ding, D. Berleant, D. Nettleton, and E. Wurtele, “Mining medline: abstracts, sentences, or phrases?” in Biocomputing 2002. World Scientific, 2001, pp. 326–337.
- C. Nédellec, “Learning language in logic-genic interaction extraction challenge,” in 4. Learning language in logic workshop (LLL05). ACM-Association for Computing Machinery, 2005.
- S. Pyysalo, A. Airola, J. Heimonen, J. Björne, F. Ginter, and T. Salakoski, “Comparative analysis of five protein-protein interaction corpora,” in BMC bioinformatics, vol. 9, no. 3. BioMed Central, 2008, pp. 1–11.
- D. Tikk, P. Thomas, P. Palaga, J. Hakenberg, and U. Leser, “A comprehensive benchmark of kernel methods to extract protein–protein interactions from literature,” PLoS Comput Biol, vol. 6, no. 7, p. e1000837, 2010.
- Q.-C. Bui, S. Katrenko, and P. M. Sloot, “A hybrid approach to extract protein–protein interactions,” Bioinformatics, vol. 27, no. 2, pp. 259–265, 2011.
- N. Warikoo, Y.-C. Chang, and W.-L. Hsu, “Lbert: Lexically aware transformer-based bidirectional encoder representation model for learning universal bio-entity relations,” Bioinformatics, vol. 37, no. 3, pp. 404–412, 2021.
- W. A. Baumgartner, Z. Lu, H. L. Johnson, J. G. Caporaso, J. Paquette, A. Lindemann, E. K. White, O. Medvedeva, K. B. Cohen, and L. Hunter, “Concept recognition for extracting protein interaction relations from biomedical text,” Genome biology, vol. 9, no. 2, pp. 1–15, 2008.
- G. Murugesan, S. Abdulkadhar, and J. Natarajan, “Distributed smoothed tree kernel for protein-protein interaction extraction from the biomedical literature,” PLoS One, vol. 12, no. 11, p. e0187379, 2017.
- Y. Peng and Z. Lu, “Deep learning for extracting protein-protein interactions from biomedical literature,” BioNLP 2017, p. 29, 2017.
- J. D. M.-W. C. Kenton and L. K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of NAACL-HLT, 2019, pp. 4171–4186.
- J. Vig, A. Madani, L. R. Varshney, C. Xiong, N. Rajani et al., “Bertology meets biology: Interpreting attention in protein language models,” in International Conference on Learning Representations, 2020.
- Z. Tang, X. Guo, Z. Bai, L. Diao, S. Lu, and L. Li, “A protein-protein interaction extraction approach based on large pre-trained language model and adversarial training,” KSII Transactions on Internet and Information Systems (TIIS), vol. 16, no. 3, pp. 771–791, 2022.
- L. B. Soares, N. Fitzgerald, J. Ling, and T. Kwiatkowski, “Matching the blanks: Distributional similarity for relation learning,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019, pp. 2895–2905.
- Y. Peng, A. Rios, R. Kavuluru, and Z. Lu, “Extracting chemical–protein relations with ensembles of svm and deep learning models,” Database, vol. 2018, 2018.
- M. Herrero-Zazo, I. Segura-Bedmar, P. Martínez, and T. Declerck, “The ddi corpus: An annotated corpus with pharmacological substances and drug–drug interactions,” Journal of biomedical informatics, vol. 46, no. 5, pp. 914–920, 2013.
- À. Bravo, J. Piñero, N. Queralt-Rosinach, M. Rautschka, and L. I. Furlong, “Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research,” BMC bioinformatics, vol. 16, no. 1, pp. 1–17, 2015.
- E. M. Van Mulligen, A. Fourrier-Reglat, D. Gurwitz, M. Molokhia, A. Nieto, G. Trifiro, J. A. Kors, and L. I. Furlong, “The eu-adr corpus: annotated drugs, diseases, targets, and their relationships,” Journal of biomedical informatics, vol. 45, no. 5, pp. 879–884, 2012.
- Y. Gu, R. Tinn, H. Cheng, M. Lucas, N. Usuyama, X. Liu, T. Naumann, J. Gao, and H. Poon, “Domain-specific language model pretraining for biomedical natural language processing,” ACM Transactions on Computing for Healthcare (HEALTH), vol. 3, no. 1, pp. 1–23, 2021.
- J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, and J. Kang, “Biobert: a pre-trained biomedical language representation model for biomedical text mining,” Bioinformatics, vol. 36, no. 4, pp. 1234–1240, 2020.
- Z. Yuan, Y. Liu, C. Tan, S. Huang, and F. Huang, “Improving biomedical pretrained language models with knowledge,” in Proceedings of the 20th Workshop on Biomedical Language Processing, 2021, pp. 180–190.
- T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz et al., “Transformers: State-of-the-art natural language processing,” in Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, 2020, pp. 38–45.
- H. Zhang, R. Guan, F. Zhou, Y. Liang, Z.-H. Zhan, L. Huang, and X. Feng, “Deep residual convolutional neural network for protein-protein interaction extraction,” IEEE Access, vol. 7, pp. 89 354–89 365, 2019.
- Gilchan Park (12 papers)
- Sean McCorkle (2 papers)
- Carlos Soto (7 papers)
- Ian Blaby (1 paper)
- Shinjae Yoo (83 papers)