Recent advances in deep learning and language models for studying the microbiome (2409.10579v1)
Abstract: Recent advancements in deep learning, particularly LLMs, made a significant impact on how researchers study microbiome and metagenomics data. Microbial protein and genomic sequences, like natural languages, form a language of life, enabling the adoption of LLMs to extract useful insights from complex microbial ecologies. In this paper, we review applications of deep learning and LLMs in analyzing microbiome and metagenomics data. We focus on problem formulations, necessary datasets, and the integration of LLMing techniques. We provide an extensive overview of protein/genomic LLMing and their contributions to microbiome studies. We also discuss applications such as novel viromics LLMing, biosynthetic gene cluster prediction, and knowledge integration for metagenomics studies.