Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
GPT-5.1
GPT-5.1 104 tok/s
Gemini 3.0 Pro 36 tok/s Pro
Gemini 2.5 Flash 133 tok/s Pro
Kimi K2 216 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Predicting Anti-microbial Resistance using Large Language Models (2401.00642v1)

Published 1 Jan 2024 in cs.CL

Abstract: During times of increasing antibiotic resistance and the spread of infectious diseases like COVID-19, it is important to classify genes related to antibiotic resistance. As natural language processing has advanced with transformer-based LLMs, many LLMs that learn characteristics of nucleotide sequences have also emerged. These models show good performance in classifying various features of nucleotide sequences. When classifying nucleotide sequences, not only the sequence itself, but also various background knowledge is utilized. In this study, we use not only a nucleotide sequence-based LLM but also a text LLM based on PubMed articles to reflect more biological background knowledge in the model. We propose a method to fine-tune the nucleotide sequence LLM and the text LLM based on various databases of antibiotic resistance genes. We also propose an LLM-based augmentation technique to supplement the data and an ensemble method to effectively combine the two models. We also propose a benchmark for evaluating the model. Our method achieved better performance than the nucleotide sequence LLM in the drug resistance class prediction.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. Deeparg: A deep learning approach for predicting antibiotic resistance genes from metagenomic data. Microbiome, 6(1):23.
  2. Megares and amr++, v3.0: an updated comprehensive database of antimicrobial resistance determinants and an improved software pipeline for classification using high-throughput sequencing. Nucleic Acids Research, 51(D1):D744–D752.
  3. Proteinbert: a universal deep-learning model of protein sequence and function. Bioinformatics, 38(8):2102–2110.
  4. The european bioinformatics institute in 2016: Data growth and integration. Nucleic Acids Research, 44(D1):D20–D26.
  5. The nucleotide transformer: Building and evaluating robust foundation models for human genomics. Genomics.
  6. Megares 2.0: a database for classification of antimicrobial drug, biocide and metal resistance determinants in metagenomic sequence data. Nucleic Acids Research, 48(D1):D561–D569.
  7. Lora: Low-rank adaptation of large language models. arXiv. ArXiv:2106.09685v2.
  8. Art: A next-generation sequencing read simulator. Bioinformatics, 28(4):593–594.
  9. Dnabert: pre-trained bidirectional encoder representations from transformers model for dna-language in genome. Bioinformatics, 37(15):2112–2120.
  10. Card 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database. Nucleic Acids Research, 45(D1):D566–D573.
  11. Fine-tuning of bert model to accurately predict drug-target interactions. Pharmaceutics, 14(8):1710.
  12. An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. International Journal of Cognitive Computing in Engineering, 2:40–46.
  13. Hierarchical hidden markov models enable accurate and diverse detection of antimicrobial resistance sequences. Communications Biology, 2(1):294.
  14. Biobert: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240.
  15. Biogpt: Generative pre-trained transformer for biomedical text generation and mining. Briefings in Bioinformatics, 23(6):bbac409.
  16. Amr-meta: A k -mer and metafeature approach to classify antimicrobial resistance from high-throughput short-read metagenomics data. GigaScience, 11. Giac029.
  17. María Katherine Mejía-Guerra and Edward S. Buckler. 2019. A k-mer grammar analysis to uncover maize regulatory architecture. BMC Plant Biology, 19(1):103.
  18. Mark Milhaven and Susanne P. Pfeifer. 2023. Performance evaluation of six popular short-read simulators. Heredity, 130(2):55–63.
  19. Assessment of global health risk of antibiotic resistance genes. Nature Communications, 13.
  20. Wenxuan Zhou and Muhao Chen. 2021. An improved baseline for sentence-level relation extraction. arXiv. ArXiv:2102.01373v4.
  21. Dnabert-2: Efficient foundation model and benchmark for multi-species genome. arXiv. ArXiv:2306.15006v1.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.