Emergent Mind

An Evaluation of Large Language Models in Bioinformatics Research

Published Feb 21, 2024 in q-bio.QM , cs.AI , and cs.LG


Large language models (LLMs) such as ChatGPT have gained considerable interest across diverse research communities. Their notable ability for text completion and generation has inaugurated a novel paradigm for language-interfaced problem solving. However, the potential and efficacy of these models in bioinformatics remain incompletely explored. In this work, we study the performance LLMs on a wide spectrum of crucial bioinformatics tasks. These tasks include the identification of potential coding regions, extraction of named entities for genes and proteins, detection of antimicrobial and anti-cancer peptides, molecular optimization, and resolution of educational bioinformatics problems. Our findings indicate that, given appropriate prompts, LLMs like GPT variants can successfully handle most of these tasks. In addition, we provide a thorough analysis of their limitations in the context of complicated bioinformatics tasks. In conclusion, we believe that this work can provide new perspectives and motivate future research in the field of LLMs applications, AI for Science and bioinformatics.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a detailed summary of this paper with a premium account.

We ran into a problem analyzing this paper.

Get summaries of Trending computer science papers delivered straight to your inbox

Unsubscribe anytime.

  1. A. Birhane, A. Kasirzadeh, D. Leslie, and S. Wachter, “Science in the age of large language models,” Nature Reviews Physics, pp. 1–4
  2. U. Katz, M. Geva, and J. Berant, “Inferring implicit relations in complex questions with language models,” in Findings of the Association for Computational Linguistics: EMNLP 2022, 2022, pp. 2548–2566.
  3. X. L. Li, A. Kuncoro, J. Hoffmann, C. de Masson d’Autume, P. Blunsom, and A. Nematzadeh, “A systematic investigation of commonsense knowledge in large language models,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2022, pp. 11 838–11 855.
  4. L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray et al., “Training language models to follow instructions with human feedback,” in Proceedings of the Conference on Neural Information Processing Systems, 2022, pp. 27 730–27 744.
  5. Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond
  6. B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up? sentiment classification using machine learning techniques,” cs/0205070
  7. M. Marrero, J. Urbano, S. Sánchez-Cuadrado, J. Morato, and J. M. Gómez-Berbís, “Named entity recognition: fallacies, challenges and opportunities,” Computer Standards & Interfaces, vol. 35, no. 5, pp. 482–489
  8. K. Lee, D. Palsetia, R. Narayanan, M. M. A. Patwary, A. Agrawal, and A. Choudhary, “Twitter trending topic classification,” in Proceedings of the International Conference on Data Mining Workshops, 2011, pp. 251–258.
  9. Summary of ChatGPT-Related Research and Perspective Towards the Future of Large Language Models
  10. T. Sun, Z. He, H. Qian, Y. Zhou, X.-J. Huang, and X. Qiu, “Bbtv2: towards a gradient-free future with large language models,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2022, pp. 3916–3930.
  11. Benchmarking Large Language Models for News Summarization
  12. I. Beltagy, A. Cohan, R. Logan IV, S. Min, and S. Singh, “Zero-and few-shot nlp with pretrained language models,” in Proceedings of the Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts, 2022, pp. 32–37.
  13. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity
  14. Is ChatGPT A Good Translator? Yes With GPT-4 As The Engine
  15. Can ChatGPT Replace Traditional KBQA Models? An In-depth Analysis of the Question Answering Performance of the GPT LLM Family
  16. T. H. Kung, M. Cheatham, A. Medenilla, C. Sillos, L. De Leon, C. Elepaño, M. Madriaga, R. Aggabao, G. Diaz-Candido, J. Maningo et al., “Performance of chatgpt on usmle: Potential for ai-assisted medical education using large language models,” PLoS Digital Health, vol. 2, no. 2, p. e0000198
  17. S. B. Patel and K. Lam, “Chatgpt: the future of discharge summaries?” The Lancet Digital Health, vol. 5, no. 3, pp. e107–e108
  18. Q. Lu, D. Dou, and T. Nguyen, “Clinicalt5: A generative language model for clinical text,” in Findings of the Association for Computational Linguistics: EMNLP 2022, 2022, pp. 5436–5443.
  19. J. Otmakhova, K. Verspoor, T. Baldwin, A. J. Yepes, and J. H. Lau, “M3: Multi-level dataset for multi-document summarisation of medical studies,” in Findings of the Association for Computational Linguistics: EMNLP 2022, 2022, pp. 3887–3901.
  20. Z. Lin, H. Akin, R. Rao, B. Hie, Z. Zhu, W. Lu, N. Smetanin, R. Verkuil, O. Kabeli, Y. Shmueli et al., “Evolutionary-scale prediction of atomic-level protein structure with a language model,” Science, vol. 379, no. 6637, pp. 1123–1130
  21. A. Elnaggar, M. Heinzinger, C. Dallago, G. Rehawi, Y. Wang, L. Jones, T. Gibbs, T. Feher, C. Angerer, M. Steinegger et al., “Prottrans: Toward understanding the language of life through self-supervised learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 10, pp. 7112–7127
  22. H. Lee, S. Lee, I. Lee, and H. Nam, “Amp-bert: Prediction of antimicrobial peptide function based on a bert model,” Protein Science, vol. 32, no. 1, p. e4529
  23. Evaluation of ChatGPT on Biomedical Tasks: A Zero-Shot Comparison with Fine-Tuned Generative Transformers
  24. Llama 2: Open Foundation and Fine-Tuned Chat Models
  25. Ö. AYDIN, “Google bard generated literature review: metaverse,” Journal of AI, vol. 7, no. 1, pp. 1–14
  26. Large Language Models, scientific knowledge and factuality: A systematic analysis in antibiotic discovery
  27. Evaluate What You Can't Evaluate: Unassessable Quality for Generated Response
  28. S. S. Biswas, “Role of chat gpt in public health,” Annals of Biomedical Engineering, vol. 51, no. 5, pp. 868–869
  29. Identifying and Extracting Rare Disease Phenotypes with Large Language Models
  30. A Preliminary Study of ChatGPT on News Recommendation: Personalization, Provider Fairness, Fake News
  31. ChatGPT vs Human-authored Text: Insights into Controllable Text Summarization and Sentence Style Transfer
  32. P. Cramer, “Alphafold2 and the future of structural biology,” Nature Structural & Molecular Biology, vol. 28, no. 9, pp. 704–705
  33. J. Deng, Z. Yang, I. Ojima, D. Samaras, and F. Wang, “Artificial intelligence in drug discovery: applications and techniques,” Briefings in Bioinformatics, vol. 23, no. 1
  34. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
  35. Molecular representation learning with language models and domain-relevant auxiliary tasks
  36. E. Shue, L. Liu, B. Li, Z. Feng, X. Li, and G. Hu, “Empowering beginners in bioinformatics with chatgpt,” bioRxiv, pp. 2023–03
  37. Many bioinformatics programming tasks can be automated with ChatGPT
  38. I. Jahan, M. T. R. Laskar, C. Peng, and J. Huang, “A comprehensive evaluation of large language models on benchmark biomedical text processing tasks,” 2023.
  39. R. Taylor, M. Kardas, G. Cucurull, T. Scialom, A. Hartshorn, E. Saravia, A. Poulton, V. Kerkez, and R. Stojnic, “Galactica: A large language model for science,” 2022.
  40. R. Luo, L. Sun, Y. Xia, T. Qin, S. Zhang, H. Poon, and T.-Y. Liu, “Biogpt: generative pre-trained transformer for biomedical text generation and mining,” Briefings in Bioinformatics, vol. 23, no. 6, Sep. 2022. [Online]. Available: http://dx.doi.org/10.1093/bib/bbac409

  41. D. B. Lubahn, T. R. Brown, J. A. Simental, H. N. Higgs, C. J. Migeon, E. M. Wilson, and F. S. French, “Sequence of the intron/exon junctions of the coding region of the human androgen receptor gene and identification of a point mutation in a family with complete androgen insensitivity.” Proceedings of the National Academy of Sciences, vol. 86, no. 23, pp. 9534–9538
  42. W. Zhu, A. Lomsadze, and M. Borodovsky, “Ab initio gene identification in metagenomic sequences ,” Nucleic Acids Research, vol. 38, no. 12, pp. e132–e132
  43. J. H. Badger and G. J. Olsen, “Critica: coding region identification tool invoking comparative analysis.” Molecular Biology and Evolution, vol. 16, no. 4, pp. 512–524
  44. R. Wise, T. Hart, O. Cars, M. Streulens, R. Helmuth, P. Huovinen, and M. Sprenger, “Antimicrobial resistance,” pp. 609–610
  45. A. A. Bahar and D. Ren, “Antimicrobial peptides,” Pharmaceuticals, vol. 6, no. 12, pp. 1543–1575
  46. M. Zasloff, “Antimicrobial peptides of multicellular organisms,” Nature, vol. 415, no. 6870, pp. 389–395
  47. B. Liu, L. Ezeogu, L. Zellmer, B. Yu, N. Xu, and D. Joshua Liao, “Protecting the normal in order to better kill the cancer,” Cancer Medicine, vol. 4, no. 9, pp. 1394–1403
  48. J. Li, S. Tan, X. Chen, C.-Y. Zhang, and Y. Zhang, “Peptide aptamers with biological and therapeutic applications,” Current Medicinal Chemistry, vol. 18, no. 27, pp. 4215–4222
  49. A. Tyagi, A. Tuknait, P. Anand, S. Gupta, M. Sharma, D. Mathur, A. Joshi, S. Singh, A. Gautam, and G. P. Raghava, “Cancerppd: a database of anticancer peptides and proteins,” Nucleic Acids Research, vol. 43, no. D1, pp. D837–D843
  50. W. Chiangjong, S. Chutipongtanate, and S. Hongeng, “Anticancer peptide: Physicochemical property, functional aspect and trend in clinical application,” International Journal of Oncology, vol. 57, no. 3, pp. 678–696
  51. Q. Li, W. Zhou, D. Wang, S. Wang, and Q. Li, “Prediction of anticancer peptides using a low-dimensional feature model,” Frontiers in Bioengineering and Biotechnology, vol. 8, p. 892
  52. M. L. Verdonk and M. J. Hartshorn, “Structure-guided fragment screening for lead discovery.” Current Opinion in Drug Discovery & Development, vol. 7, no. 4, pp. 404–410
  53. B. Ouyang, J. Wang, T. He, C. J. Bartel, H. Huo, Y. Wang, V. Lacivita, H. Kim, and G. Ceder, “Synthetic accessibility and stability rules of nasicons,” Nature Communications, vol. 12, no. 1, p. 5752
  54. G. R. Bickerton, G. V. Paolini, J. Besnard, S. Muresan, and A. L. Hopkins, “Quantifying the chemical beauty of drugs,” Nature Chemistry, vol. 4, no. 2, pp. 90–98
  55. E. A. Bruford, B. Braschi, P. Denny, T. E. Jones, R. L. Seal, and S. Tweedie, “Guidelines for human gene nomenclature,” Nature Genetics, vol. 52, no. 8, pp. 754–758
  56. K. Fundel and R. Zimmer, “Gene and protein nomenclature in public databases,” BMC Bioinformatics, vol. 7, no. 1, pp. 1–13
  57. T. C. Rindflesch, L. Tanabe, J. N. Weinstein, and L. Hunter, “Edgar: extraction of drugs, genes and relations from the biomedical literature,” in Biocomputing 2000.   World Scientific, 1999, pp. 517–528.
  58. C. Blaschke, L. Hirschman, and A. Valencia, “Information extraction in molecular biology,” Briefings in Bioinformatics, vol. 3, no. 2, pp. 154–165
  59. A. Yeh, A. Morgan, M. Colosimo, and L. Hirschman, “Biocreative task 1a: gene mention finding evaluation,” BMC Bioinformatics, vol. 6, pp. 1–10
  60. M. F. Wangler, S. Yamamoto, and H. J. Bellen, “Fruit flies in biomedical research,” Genetics, vol. 199, no. 3, pp. 639–653
  61. S. J. Goebel, G. P. Johnson, M. E. Perkus, S. W. Davis, J. P. Winslow, and E. Paoletti, “The complete dna sequence of vaccinia virus,” Virology, vol. 179, no. 1, pp. 247–266
  62. Z. Chen, M. R. Min, S. Parthasarathy, and X. Ning, “A deep generative model for molecule optimization via one fragment modification,” Nature Machine Intelligence, vol. 3, no. 12, pp. 1040–1049
  63. T. Greenhalgh, “How to read a paper: the medline database,” Bmj, vol. 315, no. 7101, pp. 180–183
  64. M. Furuno, T. Kasukawa, R. Saito, J. Adachi, H. Suzuki, R. Baldarelli, Y. Hayashizaki, and Y. Okazaki, “Cds annotation in full-length cdna sequence,” Genome Research, vol. 13, no. 6b, pp. 1478–1487
  65. T. Chen, T. He, M. Benesty, V. Khotilovich, Y. Tang, H. Cho, K. Chen, R. Mitchell, I. Cano, T. Zhou et al., “Xgboost: extreme gradient boosting,” R package version 0.4-2, vol. 1, no. 4, pp. 1–4
  66. A. M. Kibriya, E. Frank, B. Pfahringer, and G. Holmes, “Multinomial naive bayes for text categorization revisited,” in AI 2004: Advances in Artificial Intelligence: 17th Australian Joint Conference on Artificial Intelligence, Cairns, Australia, December 4-6, 2004. Proceedings 17, 2005, pp. 488–499.
  67. V. Jakkula, “Tutorial on support vector machine (svm),” School of EECS, Washington State University, vol. 37, no. 2.5, p. 3
  68. G. Guo, H. Wang, D. Bell, Y. Bi, and K. Greer, “Knn model-based approach in classification,” in On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE: OTM Confederated International Conferences, CoopIS, DOA, and ODBASE 2003, Catania, Sicily, Italy, November 3-7, 2003. Proceedings, 2003, pp. 986–996.
  69. M. Maalouf, “Logistic regression in data analysis: an overview,” International Journal of Data Analysis Techniques and Strategies, vol. 3, no. 3, pp. 281–299
  70. A. Pinkus, “Approximation theory of the mlp model in neural networks,” Acta Numerica, vol. 8, pp. 143–195
  71. G. Biau and E. Scornet, “A random forest guided tour,” TEST, vol. 25, pp. 197–227
  72. H. Saigo, S. Nowozin, T. Kadowaki, T. Kudo, and K. Tsuda, “gboost: a mathematical programming approach to graph classification and regression,” Machine Learning, vol. 75, pp. 69–89
  73. S. A. Wildman and G. M. Crippen, “Prediction of physicochemical parameters by atomic contributions,” Journal of Chemical Information and Computer Sciences, vol. 39, no. 5, pp. 868–873
  74. P. Ertl and A. Schuffenhauer, “Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions,” Journal of Cheminformatics, vol. 1, pp. 1–11
  75. Learning Multimodal Graph-to-Graph Translation for Molecular Optimization
  76. T. Sterling and J. J. Irwin, “Zinc 15–ligand discovery for everyone,” Journal of Chemical Information and Modeling, vol. 55, no. 11, pp. 2324–2337
  77. H. Cho and H. Lee, “Biomedical named entity recognition using deep neural networks with contextual information,” BMC bioinformatics, vol. 20, pp. 1–11
  78. H. Zhao, Y. Yang, Q. Zhang, and L. Si, “Improve neural entity recognition via multi-task data selection and constrained decoding,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), 2018, pp. 346–351.
  79. J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, and J. Kang, “Biobert: a pre-trained biomedical language representation model for biomedical text mining,” Bioinformatics, vol. 36, no. 4, pp. 1234–1240
  80. T. Bansal, R. Jha, and A. McCallum, “Learning to few-shot learn across diverse natural language classification tasks,” 2020.
  81. T. Zhang, C. Xia, P. S. Yu, Z. Liu, and S. Zhao, “Pdaln: Progressive domain adaptation over a pre-trained model for low-resource cross-domain named entity recognition,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing
  82. R. A. Baeza-Yates, “Algorithms for string searching,” in ACM SIGIR Forum, vol. 23, no. 3-4, 1989, pp. 34–58.
  83. L. Lovász and H. J. Prömel, “Combinatorics,” Oberwolfach Reports, vol. 1, no. 1, pp. 5–110
  84. R. C. Edgar and S. Batzoglou, “Multiple sequence alignment,” Current Opinion in Structural Biology, vol. 16, no. 3, pp. 368–373
  85. K. G. Field, G. J. Olsen, D. J. Lane, S. J. Giovannoni, M. T. Ghiselin, E. C. Raff, N. R. Pace, and R. A. Raff, “Molecular phylogeny of the animal kingdom,” Science, vol. 239, no. 4841, pp. 748–753

Show All 85