Gene-Metabolite Association Prediction with Interactive Knowledge Transfer Enhanced Graph for Metabolite Production (2410.18475v2)
Abstract: In the rapidly evolving field of metabolic engineering, the quest for efficient and precise gene target identification for metabolite production enhancement presents significant challenges. Traditional approaches, whether knowledge-based or model-based, are notably time-consuming and labor-intensive, due to the vast scale of research literature and the approximation nature of genome-scale metabolic model (GEM) simulations. Therefore, we propose a new task, Gene-Metabolite Association Prediction based on metabolic graphs, to automate the process of candidate gene discovery for a given pair of metabolite and candidate-associated genes, as well as presenting the first benchmark containing 2474 metabolites and 1947 genes of two commonly used microorganisms Saccharomyces cerevisiae (SC) and Issatchenkia orientalis (IO). This task is challenging due to the incompleteness of the metabolic graphs and the heterogeneity among distinct metabolisms. To overcome these limitations, we propose an Interactive Knowledge Transfer mechanism based on Metabolism Graph (IKT4Meta), which improves the association prediction accuracy by integrating the knowledge from different metabolism graphs. First, to build a bridge between two graphs for knowledge transfer, we utilize Pretrained LLMs (PLMs) with external knowledge of genes and metabolites to help generate inter-graph links, significantly alleviating the impact of heterogeneity. Second, we propagate intra-graph links from different metabolic graphs using inter-graph links as anchors. Finally, we conduct the gene-metabolite association prediction based on the enriched metabolism graphs, which integrate the knowledge from multiple microorganisms. Experiments on both types of organisms demonstrate that our proposed methodology outperforms baselines by up to 12.3% across various link prediction frameworks.
- M. J. Volk, V. G. Tran, S.-I. Tan, S. Mishra, Z. Fatma, A. Boob, H. Li, P. Xue, T. A. Martin, and H. Zhao, “Metabolic Engineering: Methodologies and Applications,” Chemical Reviews, vol. 123, no. 9, pp. 5521–5570, May 2023.
- A. G. Boob, J. Chen, and H. Zhao, “Enabling pathway design by multiplex experimentation and machine learning,” Metabolic Engineering, vol. 81, pp. 70–87, Jan. 2024.
- A. P. Burgard, P. Pharkya, and C. D. Maranas, “Optknock: A bilevel programming framework for identifying gene knockout strategies for microbial strain optimization,” Biotechnology and Bioengineering, vol. 84, no. 6, pp. 647–657, Dec. 2003.
- S. Ranganathan, P. F. Suthers, and C. D. Maranas, “OptForce: An Optimization Procedure for Identifying All Genetic Manipulations Leading to Targeted Overproductions,” PLOS Computational Biology, vol. 6, no. 4, p. e1000744, Apr. 2010.
- E. J. O’Brien, J. M. Monk, and B. O. Palsson, “Using genome-scale models to predict biological capabilities,” Cell, vol. 161, no. 5, pp. 971–987, 2015.
- J. D. Orth and B. Ø. Palsson, “Systematizing the generation of missing metabolic knowledge,” Biotechnology and Bioengineering, vol. 107, no. 3, pp. 403–412, 2010.
- C. Zhang, B. J. Sánchez, F. Li, C. W. Q. Eiden, W. T. Scott, U. W. Liebal, L. M. Blank, H. G. Mengers, M. Anton, A. T. Rangel, S. N. Mendoza, L. Zhang, J. Nielsen, H. Lu, and E. J. Kerkhoven, “Yeast9: A Consensus Yeast Metabolic Model Enables Quantitative Analysis of Cellular Metabolism By Incorporating Big Data,” p. 2023.12.03.569754, Dec. 2023.
- J. M. Cherry, C. Adler, C. Ball, S. A. Chervitz, S. S. Dwight, E. T. Hester, Y. Jia, G. Juvik, T. Roe, M. Schroeder et al., “Sgd: Saccharomyces genome database,” Nucleic acids research, vol. 26, no. 1, pp. 73–79, 1998.
- E. D. Wong, S. R. Miyasato, S. Aleksander, K. Karra, R. S. Nash, M. S. Skrzypek, S. Weng, S. R. Engel, and J. M. Cherry, “Saccharomyces genome database update: Server architecture, pan-genome nomenclature, and external resources,” Genetics, vol. 224, no. 1, p. iyac191, May 2023.
- V. Satish Kumar, M. S. Dasika, and C. D. Maranas, “Optimization based automated curation of metabolic reconstructions,” BMC bioinformatics, vol. 8, pp. 1–16, 2007.
- I. Thiele, N. Vlassis, and R. M. Fleming, “fastgapfill: efficient gap filling in metabolic networks,” Bioinformatics, vol. 30, no. 17, pp. 2529–2531, 2014.
- M. Durot, P.-Y. Bourguignon, and V. Schachter, “Genome-scale models of bacterial metabolism: Reconstruction and applications,” FEMS Microbiology Reviews, vol. 33, no. 1, pp. 164–190, Jan. 2009.
- C. Chen, C. Liao, and Y.-Y. Liu, “Teasing out missing reactions in genome-scale metabolic networks through hypergraph learning,” Nature Communications, vol. 14, no. 1, p. 2375, Apr. 2023.
- G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” stat, vol. 1050, p. 9, 2015.
- X. Luo, Z. Sun, J. Zhao, Z. Zhao, and W. Hu, “Knowla: Enhancing parameter-efficient finetuning with knowledgeable adaptation,” arXiv e-prints, pp. arXiv–2403, 2024.
- Z. Sun, J. Huang, J. Lin, X. Xu, Q. Chen, and W. Hu, “Joint pre-training and local re-training: Transferable representation learning on multi-source knowledge graphs,” in Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023, pp. 2132–2144.
- Z. Sun, J. Huang, X. Xu, Q. Chen, W. Ren, and W. Hu, “What makes entities similar? a similarity flooding perspective for multi-sourced knowledge graph embeddings,” in International Conference on Machine Learning. PMLR, 2023, pp. 32 875–32 885.
- Z. Sun, M. Chen, and W. Hu, “Knowing the no-match: Entity alignment with dangling cases,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021, pp. 3582–3593.
- H. Lu, F. Li, B. J. Sánchez, Z. Zhu, G. Li, I. Domenzain, S. Marcišauskas, P. M. Anton, D. Lappa, C. Lieven, M. E. Beber, N. Sonnenschein, E. J. Kerkhoven, and J. Nielsen, “A consensus S. cerevisiae metabolic model Yeast8 and its ecosystem for comprehensively probing cellular metabolism,” Nature Communications, vol. 10, no. 1, p. 3586, Aug. 2019.
- P. F. Suthers, H. V. Dinh, Z. Fatma, Y. Shen, S. H. J. Chan, J. D. Rabinowitz, H. Zhao, and C. D. Maranas, “Genome-scale metabolic reconstruction of the non-model yeast Issatchenkia orientalis SD108 and its application to organic acids production,” Metabolic Engineering Communications, vol. 11, p. e00148, Dec. 2020.
- Z. Hong, A. Ajith, J. Pauloski, E. Duede, K. Chard, and I. Foster, “The diminishing returns of masked language models to science,” in Findings of the Association for Computational Linguistics: ACL 2023, 2023, pp. 1270–1283.
- M. Zvyagin, A. Brace, K. Hippe, Y. Deng, B. Zhang, C. O. Bohorquez, A. Clyde, B. Kale, D. Perez-Rivera, H. Ma et al., “Genslms: Genome-scale language models reveal sars-cov-2 evolutionary dynamics,” The International Journal of High Performance Computing Applications, vol. 37, no. 6, pp. 683–705, 2023.
- Z. Zeng, Y. Yao, Z. Liu, and M. Sun, “A deep-learning system bridging molecule structure and biomedical text with comprehension comparable to human professionals,” Nature communications, vol. 13, no. 1, p. 862, 2022.
- T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in International Conference on Learning Representations, 2016.
- R. K. Srivastava, K. Greff, and J. Schmidhuber, “Highway networks,” arXiv preprint arXiv:1505.00387, 2015.
- P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph attention networks,” arXiv preprint arXiv:1710.10903, 2017.
- Z. Wang, Q. Lv, X. Lan, and Y. Zhang, “Cross-lingual knowledge graph alignment via graph convolutional networks,” in Proceedings of the 2018 conference on empirical methods in natural language processing, 2018, pp. 349–357.
- Z. Sun, Q. Zhang, W. Hu, C. Wang, M. Chen, F. Akrami, and C. Li, “A benchmarking study of embedding-based entity alignment for knowledge graphs,” Proceedings of the VLDB Endowment, vol. 13, no. 12, 2020.
- T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. Le Scao, S. Gugger, M. Drame, Q. Lhoest, and A. Rush, “Transformers: State-of-the-art natural language processing,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Q. Liu and D. Schlangen, Eds. Online: Association for Computational Linguistics, Oct. 2020, pp. 38–45. [Online]. Available: https://aclanthology.org/2020.emnlp-demos.6
- A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko, “Translating embeddings for modeling multi-relational data,” Advances in neural information processing systems, vol. 26, 2013.
- Z. Sun, Z.-H. Deng, J.-Y. Nie, and J. Tang, “Rotate: Knowledge graph embedding by relational rotation in complex space,” in International Conference on Learning Representations, 2018.
- B. Yang, W.-t. Yih, X. He, J. Gao, and L. Deng, “Embedding entities and relations for learning and inference in knowledge bases,” arXiv preprint arXiv:1412.6575, 2014.
- D. Machado, M. J. Herrgård, and I. Rocha, “Stoichiometric representation of gene–protein–reaction associations leverages constraint-based analysis from reaction to gene-level phenotype prediction,” PLoS computational biology, vol. 12, no. 10, p. e1005140, 2016.
- S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” Journal of molecular biology, vol. 215, no. 3, pp. 403–410, 1990.
- B. Buchfink, C. Xie, and D. H. Huson, “Fast and sensitive protein alignment using diamond,” Nature methods, vol. 12, no. 1, pp. 59–60, 2015.
- R. D. Finn, J. Clements, and S. R. Eddy, “Hmmer web server: interactive sequence similarity searching,” Nucleic acids research, vol. 39, no. suppl_2, pp. W29–W37, 2011.
- J. Y. Ryu, H. U. Kim, and S. Y. Lee, “Deep learning enables high-quality and high-throughput prediction of enzyme commission numbers,” Proceedings of the National Academy of Sciences, vol. 116, no. 28, pp. 13 996–14 001, 2019.
- T. Yu, H. Cui, J. C. Li, Y. Luo, G. Jiang, and H. Zhao, “Enzyme function prediction using contrastive learning,” Science, vol. 379, no. 6639, pp. 1358–1363, 2023.
- B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning of social representations,” in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, 2014, pp. 701–710.
- A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 2016, pp. 855–864.
- L. F. Ribeiro, P. H. Saverese, and D. R. Figueiredo, “struc2vec: Learning node representations from structural identity,” in Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, 2017, pp. 385–394.
- Y. Zhang, J. Ding, and X. Li, “Network controllability robustness learning via spatial graph neural networks,” IEEE Transactions on Network Science and Engineering, 2024.
- Z. Deng and H. Yu, “Noise-resistant graph neural network for node classification,” in ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024, pp. 7560–7564.
- C. Su, J. Tong, Y. Zhu, P. Cui, and F. Wang, “Network embedding in biomedical data science,” Briefings in bioinformatics, vol. 21, no. 1, pp. 182–197, 2020.
- Y. Lin, Z. Liu, M. Sun, Y. Liu, and X. Zhu, “Learning entity and relation embeddings for knowledge graph completion,” in Proceedings of the AAAI conference on artificial intelligence, vol. 29, no. 1, 2015.
- Z. Wang, J. Zhang, J. Feng, and Z. Chen, “Knowledge graph embedding by translating on hyperplanes,” in Proceedings of the AAAI conference on artificial intelligence, vol. 28, no. 1, 2014.
- I. Balažević, C. Allen, and T. M. Hospedales, “Tucker: Tensor factorization for knowledge graph completion,” arXiv preprint arXiv:1901.09590, 2019.
- S. M. Kazemi and D. Poole, “Simple embedding for link prediction in knowledge graphs,” Advances in neural information processing systems, vol. 31, 2018.
- Y. Wang, R. Gemulla, and H. Li, “On multi-relational link prediction with bilinear models,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018.
- T. Dettmers, P. Minervini, P. Stenetorp, and S. Riedel, “Convolutional 2d knowledge graph embeddings,” in Proceedings of the AAAI conference on artificial intelligence, vol. 32, no. 1, 2018.
- B. Shi and T. Weninger, “Proje: Embedding projection for knowledge graph completion,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1, 2017.
- S. Vashishth, S. Sanyal, V. Nitin, and P. Talukdar, “Composition-based multi-relational graph convolutional networks,” arXiv preprint arXiv:1911.03082, 2019.
- M. Chen, Y. Tian, M. Yang, and C. Zaniolo, “Multilingual knowledge graph embeddings for cross-lingual knowledge alignment,” arXiv preprint arXiv:1611.03954, 2016.
- Q. Zhang, Z. Sun, W. Hu, M. Chen, L. Guo, and Y. Qu, “Multi-view knowledge graph embedding for entity alignment,” arXiv preprint arXiv:1906.02390, 2019.
- Z. Sun, C. Wang, W. Hu, M. Chen, J. Dai, W. Zhang, and Y. Qu, “Knowledge graph alignment network with gated multi-hop neighborhood aggregation,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 01, 2020, pp. 222–229.
- L. Guo, Q. Zhang, Z. Sun, M. Chen, W. Hu, and H. Chen, “Understanding and improving knowledge graph embedding for entity alignment,” in International Conference on Machine Learning. PMLR, 2022, pp. 8145–8156.
- X. Chen, M. Chen, C. Fan, A. Uppunda, Y. Sun, and C. Zaniolo, “Multilingual knowledge graph completion via ensemble knowledge transfer,” arXiv preprint arXiv:2010.03158, 2020.
- H. Singh, S. Chakrabarti, P. JAIN, S. R. Choudhury et al., “Multilingual knowledge graph completion with joint relation and entity alignment,” in 3rd Conference on Automated Knowledge Base Construction, 2021.
- Z. Huang, Z. Li, H. Jiang, T. Cao, H. Lu, B. Yin, K. Subbian, Y. Sun, and W. Wang, “Multilingual knowledge graph completion with self-supervised adaptive graph alignment,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022, pp. 474–485.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.