Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Bayesian Nonparametric Approach for Identifying Differentially Abundant Taxa in Multigroup Microbiome Data with Covariates (2206.10108v3)

Published 21 Jun 2022 in stat.ME

Abstract: Scientific studies in the last two decades have established the central role of the microbiome in disease and health. Differential abundance analysis seeks to identify microbial taxa associated with sample groups defined by a factor such as disease subtype, geographical region, or environmental condition. The results, in turn, help clinical practitioners and researchers diagnose disease and develop treatments more effectively. However, microbiome data analysis is uniquely challenging due to high-dimensionality, sparsity, compositionally, and collinearity. There is a critical need for unified statistical approaches for differential analysis in the presence of covariates. We develop a zero-inflated Bayesian nonparametric (ZIBNP) methodology that meets these multipronged challenges. The proposed technique flexibly adapts to the unique data characteristics, casts the high proportion of zeros in a censoring framework, and mitigates high-dimensionality and collinearity by utilizing the dimension-reducing property of the semiparametric Chinese restaurant process. Additionally, the ZIBNP approach relates the microbiome sampling depths to inferential precision while accommodating the compositional nature of microbiome data. Through simulation studies and analyses of the CAnine Microbiome during Parasitism (CAMP) and Global Gut microbiome datasets, we demonstrate the accuracy of ZIBNP compared to established methods for differential abundance analysis in the presence of covariates.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. William G. Wade. The oral microbiome in health and disease. Pharmacological Research, 69(1):137–143, 2013. ISSN 1043-6618.
  2. The gut microbiome in health and in disease. Current Opinion in Gastroenterology, 31(1):69–75, Jan 2015.
  3. S. Rautava. Early microbial contact, the breast milk microbiome and child health. Journal of Developmental Origins of Health and Disease, 7(1):5–14, 2016. doi: 10.1017/S2040174415001233.
  4. Dietary metabolism, the gut microbiome, and heart failure. Nature Reviews Cardiology, 16(3):137–154, Mar 2019. ISSN 1759-5010. doi: 10.1038/s41569-018-0108-7. URL https://doi.org/10.1038/s41569-018-0108-7.
  5. Investigating differential abundance methods in microbiome data: A benchmark study. PLOS Computational Biology, 18(9):e1010467, 2022.
  6. Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome, 5(1):27, Mar 2017. ISSN 2049-2618. doi: 10.1186/s40168-017-0237-y. URL https://doi.org/10.1186/s40168-017-0237-y.
  7. Host variables confound gut microbiota studies of human disease. Nature, 587(7834):448–454, Nov 2020. ISSN 1476-4687. doi: 10.1038/s41586-020-2881-9. URL https://doi.org/10.1038/s41586-020-2881-9.
  8. Incorporating 16s gene copy number information improves estimates of microbial diversity and abundance. PLOS Computational Biology, 8(10):1–11, 10 2012. doi: 10.1371/journal.pcbi.1002743. URL https://doi.org/10.1371/journal.pcbi.1002743.
  9. Kyle Bibby. Metagenomic identification of viral pathogens. Trends in Biotechnology, 31(5):275–279, 2013. ISSN 0167-7799. doi: https://doi.org/10.1016/j.tibtech.2013.01.016. URL https://www.sciencedirect.com/science/article/pii/S0167779913000292.
  10. David A. Relman. Metagenomics, Infectious Disease Diagnostics, and Outbreak Investigations: Sequence First, Ask Questions Later? JAMA, 309(14):1531–1532, 04 2013. ISSN 0098-7484. doi: 10.1001/jama.2013.3678. URL https://doi.org/10.1001/jama.2013.3678.
  11. Thomas J. Sharpton. An introduction to the analysis of shotgun metagenomic data. Frontiers in Plant Science, 5, 2014. ISSN 1664-462X. doi: 10.3389/fpls.2014.00209. URL https://www.frontiersin.org/article/10.3389/fpls.2014.00209.
  12. Current challenges and best-practice protocols for microbiome analysis. Briefings in Bioinformatics, 22(1):178–193, 12 2019. ISSN 1477-4054. doi: 10.1093/bib/bbz155. URL https://doi.org/10.1093/bib/bbz155.
  13. Bioinformatic Analysis of Microbiome Data, pages 1–27. Springer Singapore, Singapore, 2018a. ISBN 978-981-13-1534-3. doi: 10.1007/978-981-13-1534-3˙1. URL https://doi.org/10.1007/978-981-13-1534-3_1.
  14. Metagenomic geolocation prediction using an adaptive ensemble classifier. Frontiers in Genetics, 12:642282, 2021.
  15. Assessing and improving methods used in operational taxonomic unit-based approaches for 16s rrna gene sequence analysis. Applied and Environmental Microbiology, 77(10):3219–3226, 2011.
  16. J. Aitchison. The single principle of compositional data analysis, continuing fallacies, confusions and misunderstandings and some suggested remedies. Proceedings of CoDaWork ’08, The 3rd Compositional Data Analysis Workshop, Girona, Spain, 2008.
  17. Methods for normalizing microbiome data: An ecological perspective. Methods in Ecology and Evolution, 10(3):389–400, 2019. doi: https://doi.org/10.1111/2041-210X.13115. URL https://besjournals.onlinelibrary.wiley.com/doi/abs/10.1111/2041-210X.13115.
  18. Microbiota features associated with a high-fat/low-fiber diet in healthy adults. Frontiers in Nutrition, 7:583608, 2020.
  19. Naught all zeros in sequence count data are the same. Computational and Structural Biotechnology Journal, 18:2789–2798, 2020. ISSN 2001-0370. doi: https://doi.org/10.1016/j.csbj.2020.09.014. URL https://www.sciencedirect.com/science/article/pii/S2001037020303986.
  20. Analysis of microbiome data in the presence of excess zeros. Frontiers in Microbiology, 8:2114–2114, Nov 2017. ISSN 1664-302X. doi: 10.3389/fmicb.2017.02114. URL https://pubmed.ncbi.nlm.nih.gov/29163406. 29163406[pmid].
  21. Zachary D. Wallen. Comparison study of differential abundance testing methods using two large parkinson disease gut microbiome datasets derived from 16s amplicon sequencing. BMC Bioinformatics, 22(1):265, May 2021. ISSN 1471-2105. doi: 10.1186/s12859-021-04193-6. URL https://doi.org/10.1186/s12859-021-04193-6.
  22. Statistical Analysis of Microbiome Data. Springer, 2021.
  23. Statistical Methods for Feature Identification in Microbiome Studies, pages 175–192. Springer International Publishing, Cham, 2021. ISBN 978-3-030-73351-3. doi: 10.1007/978-3-030-73351-3˙7. URL https://doi.org/10.1007/978-3-030-73351-3_7.
  24. Microbiome differential abundance methods produce different results across 38 datasets. Nature Communications, 13(1):342, Jan 2022. ISSN 2041-1723. doi: 10.1038/s41467-022-28034-z. URL https://doi.org/10.1038/s41467-022-28034-z.
  25. Unifying the analysis of high-throughput sequencing datasets: characterizing rna-seq, 16s rrna gene sequencing and selective growth experiments by compositional data analysis. *** Microbiome, 2014, volume 2, 15, 2014. URL ***http://doi:10.1186/2049-2618-2-15.
  26. Differential abundance analysis for microbial marker-gene surveys. Nature Methods, 10(12):1200–1202, Dec 2013. ISSN 1548-7105. doi: 10.1038/nmeth.2658. URL https://doi.org/10.1038/nmeth.2658.
  27. A general and flexible method for signal extraction from single-cell rna-seq data. Nature Communications, 9(1):284, Jan 2018. ISSN 2041-1723. doi: 10.1038/s41467-017-02554-5. URL https://doi.org/10.1038/s41467-017-02554-5.
  28. Modeling Zero-Inflated Microbiome Data, pages 453–496. Springer Singapore, Singapore, 2018b. ISBN 978-981-13-1534-3. doi: 10.1007/978-981-13-1534-3˙12. URL https://doi.org/10.1007/978-981-13-1534-3_12.
  29. Analysis of compositions of microbiomes with bias correction. Nature Communications, 11(1):3514, Jul 2020. ISSN 2041-1723. doi: 10.1038/s41467-020-17041-7. URL https://doi.org/10.1038/s41467-020-17041-7.
  30. Multivariable association discovery in population-scale meta-omics studies. PLOS Computational Biology, 17(11):1–27, 11 2021. doi: 10.1371/journal.pcbi.1009442. URL https://doi.org/10.1371/journal.pcbi.1009442.
  31. Genome Biology, 15(12):1–21, 2014.
  32. Modeling microbial abundances and dysbiosis with beta-binomial regression. The Annals of Applied Statistics, 14(1):94, 2020.
  33. Bayesian Nonparametric Inference – Why and How. Bayesian Analysis, 8(2):269 – 302, 2013. doi: 10.1214/13-BA811. URL https://doi.org/10.1214/13-BA811.
  34. A. Lijoi and I. Prünster. Models beyond the Dirichlet process, pages 80–136. Cambridge Series in Statistical and Probabilistic Mathematics, 2010.
  35. MicrobiomeDB: a systems biology platform for integrating, mining and analyzing microbiome experiments. Nucleic Acids Research, 46(D1):D684–D691, 11 2017. ISSN 0305-1048. doi: 10.1093/nar/gkx1027. URL https://doi.org/10.1093/nar/gkx1027.
  36. Human gut microbiome viewed across age and geography. Nature, 486(7402):222–227, 2012.
  37. Cococonet: conserved and comparative co-expression across a diverse set of species. Nucleic Acids Research, 48(W1):W566–W571, 2020.
  38. Bayesian mixture model based clustering of replicated microarray data. Bioinformatics, 20:1222–1232, 2004.
  39. Variable selection in clustering via dirichlet process mixture models. Biometrika, 93:877?–893, 2006.
  40. Bayesian selection and clustering of polymorphisms in functionally-related genes. Journal of the American Statistical Association, 103:534–546, 2008.
  41. Kernel stick-breaking processes. Biometrika, 95:307–323, 2008.
  42. Predicting phenotypes from brain connection structure. Journal of the Royal Statistical Society: Series C (Applied Statistics), 71(4):639–668, 2022.
  43. Nonparametric bayes differential analysis of multigroup dna methylation data. Bayesian Analysis, 2023. To appear.
  44. A novel normalization and differential abundance test framework for microbiome data. Bioinformatics, 36(13):3959–3965, 04 2020. ISSN 1367-4803. doi: 10.1093/bioinformatics/btaa255. URL https://doi.org/10.1093/bioinformatics/btaa255.
  45. mbimpute: an accurate and robust imputation method for microbiome data. Genome Biology, 22(1):192, Jun 2021. ISSN 1474-760X. doi: 10.1186/s13059-021-02400-4. URL https://doi.org/10.1186/s13059-021-02400-4.
  46. W. R. Gilks and P. Wild. Adaptive rejection sampling for gibbs sampling. Journal of the Royal Statistical Society. Series C (Applied Statistics), 41(2):337–348, 1992. ISSN 00359254, 14679876. URL http://www.jstor.org/stable/2347565.
  47. Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics, 5(2):155–176, Apr 2004.
  48. Patterns of oral microbiota diversity in adults and children: A crowdsourced population study. Scientific Reports, 10(1):2133, Feb 2020. ISSN 2045-2322. doi: 10.1038/s41598-020-59016-0. URL https://doi.org/10.1038/s41598-020-59016-0.
  49. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57(1):289–300, 1995. doi: https://doi.org/10.1111/j.2517-6161.1995.tb02031.x. URL https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.2517-6161.1995.tb02031.x.
  50. Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16s rrna gene amplicon data analysis methods used in microbiome studies. Microbiome, 4:1–14, 2016.
  51. A broken promise: microbiome differential abundance methods do not control the false discovery rate. Briefings in Bioinformatics, 20(1):210–221, 08 2017. ISSN 1477-4054. doi: 10.1093/bib/bbx104. URL https://doi.org/10.1093/bib/bbx104.
  52. Natural infection with giardia is associated with altered community structure of the human and canine gut microbiome. Msphere, 5(4):10–1128, 2020.
  53. The role of the canine gut microbiome and metabolome in health and gastrointestinal disease. Frontiers in Veterinary Science, 6, 2020. ISSN 2297-1769. doi: 10.3389/fvets.2019.00498. URL https://www.frontiersin.org/article/10.3389/fvets.2019.00498.
  54. A novel ruminococcus gnavus clade enriched in inflammatory bowel disease patients. Genome Medicine, 9, 11 2017. doi: 10.1186/s13073-017-0490-5.
  55. Age and giardia intestinalis infection impact canine gut microbiota. Microorganisms, 9(9):1862, Sep 2021. ISSN 2076-2607. doi: 10.3390/microorganisms9091862. URL https://pubmed.ncbi.nlm.nih.gov/34576757. 34576757[pmid].
  56. Paul Jaccard. Etude de la distribution florale dans une portion des alpes et du jura. Bulletin de la Societe Vaudoise des Sciences Naturelles, 37:547–579, 01 1901. doi: 10.5169/seals-266450.
  57. structssi: simultaneous and selective inference for grouped or hierarchically structured data. Journal of Statistical Software, 59(13):1–21, 2014.
  58. False discovery rate control incorporating phylogenetic tree increases detection power in microbiome-wide multiple testing. Bioinformatics, 33(18):2873–2881, 05 2017. ISSN 1367-4803. doi: 10.1093/bioinformatics/btx311. URL https://doi.org/10.1093/bioinformatics/btx311.

Summary

We haven't generated a summary for this paper yet.