A Unified Combination Framework for Dependent Tests with Applications to Microbiome Association Studies (2404.09353v1)
Abstract: We introduce a novel meta-analysis framework to combine dependent tests under a general setting, and utilize it to synthesize various microbiome association tests that are calculated from the same dataset. Our development builds upon the classical meta-analysis methods of aggregating $p$-values and also a more recent general method of combining confidence distributions, but makes generalizations to handle dependent tests. The proposed framework ensures rigorous statistical guarantees, and we provide a comprehensive study and compare it with various existing dependent combination methods. Notably, we demonstrate that the widely used Cauchy combination method for dependent tests, referred to as the vanilla Cauchy combination in this article, can be viewed as a special case within our framework. Moreover, the proposed framework provides a way to address the problem when the distributional assumptions underlying the vanilla Cauchy combination are violated. Our numerical results demonstrate that ignoring the dependence among the to-be-combined components may lead to a severe size distortion phenomenon. Compared to the existing $p$-value combination methods, including the vanilla Cauchy combination method, the proposed combination framework can handle the dependence accurately and utilizes the information efficiently to construct tests with accurate size and enhanced power. The development is applied to Microbiome Association Studies, where we aggregate information from multiple existing tests using the same dataset. The combined tests harness the strengths of each individual test across a wide range of alternative spaces, %resulting in a significant enhancement of testing power across a wide range of alternative spaces, enabling more efficient and meaningful discoveries of vital microbiome associations.
- Bahadur, R. R. (1967). Rates of convergence of estimates and test statistics. The Annals of Mathematical Statistics, 38(2):303–324.
- Introduction to Meta-Analysis. John Wiley & Sons.
- An ordination of the upland forest communities of southern wisconsin. Ecological Monographs, 27(4):326–349.
- Power enhancement and phase transitions for global testing of the mixed membership stochastic block model. Bernoulli, 29(3):1741–1763.
- Disordered microbial communities in the upper respiratory tract of cigarette smokers. PLoS One, 5(12):e15216.
- Kernel methods for regression analysis of microbiome compositional data. Springer Proceedings in Mathematics and Statistics, 55:191–201.
- A split-and-conquer approach for analysis of extraordinarily large data. Statistica Sinica, 24:1655–1684.
- Error variance estimation in ultrahigh-dimensional additive models. Journal of the American Statistical Association, 113(521):315–327.
- Test for high-dimensional regression coefficients using refitted cross-validation variance estimation. The Annals of Statistics, 46(3):958–988.
- Power enhancement in high-dimensional cross-sectional tests. Econometrica, 83(4):1497–1541.
- Heavy-tailed distribution for combining dependent p𝑝pitalic_p-values with asymptotic robustness. Statistica Sinica, 33:1115–1142.
- Fisher, R. A. (1925). Statistical Methods for Research Workers, volume 1. Edinburgh by Oliver and Boyd.
- Impact of cigarette smoke exposure on host-bacterial pathogen interactions. European Respiratory Journal, 39(2):467–477.
- The treatment-naive microbiome in new-onset Crohn’s disease. Cell Host & Microbe, 15(3):382–392.
- Asymptotically independent U-statistics in high-dimensional testing. The Annals of Statistics, 49(1):154–181.
- A powerful microbial group association test based on the higher criticism analysis for sparse microbial association signals. Microbiome, 8(1):1–16.
- A clarification of the Cauchy distribution. Communications for Statistical Applications and Methods, 21(2):183–191.
- Asymptotic optimality of Fisher’s method of combining independent tests. Journal of the American Statistical Association, 66(336):802–806.
- Asymptotic optimality of Fisher’s method of combining independent tests II. Journal of the American Statistical Association, 68(341):193–194.
- Multiple-splitting projection test for high-dimensional mean vectors. Journal of Machine Learning Research, 23(71):1–27.
- ACAT: a fast and powerful p value combination method for rare-variant analysis in sequencing studies. The American Journal of Human Genetics, 104(3):410–421.
- Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures. Journal of the American Statistical Association, 115(529):393–402.
- The Cauchy combination test under arbitrary dependence structures. The American Statistician, 77(2):134–142.
- Dysfunction of the intestinal microbiome in inflammatory bowel disease and treatment. Genome Biology, 13(9):R79.
- An unexpected encounter with cauchy and lévy. The Annals of Statistics, 44(5):2089 – 2097.
- Rüschendorf, L. (1982). Random variables with maximum sums. Advances in Applied Probability, 14(3):623–632.
- Simes, R. J. (1986). An improved Bonferroni procedure for multiple tests of significance. Biometrika, 73(3):751–754.
- Combining information from independent sources through confidence distributions. The Annals of Statistics, 33(1):159–183.
- Stouffer, S. A. (1949). Adjustment during Army Life. Princeton University Press.
- Multivariate survival analysis in big data: A divide-and-combine approach. Biometrics, 78(3):852–866.
- Wilson, D. J. (2019). The harmonic mean p-value for combining dependent tests. Proceedings of the National Academy of Sciences, 116(4):1195–1200.
- MiRKAT: kernel machine regression-based global association tests for the microbiome. Bioinformatics, 37(11):1595–1597.
- An adaptive association test for microbiome data. Genome Medicine, 8(1):56.
- Confidence distributions and a unifying framework for meta-analysis. Journal of the American Statistical Association, 106(493):320–333.
- An adaptive two-sample test for high-dimensional means. Biometrika, 103(3):609–624.
- Fisher’s combined probability test for high-dimensional covariance matrices. Journal of the American Statistical Association, 119(545):511–524.
- Power-enhanced simultaneous test of high-dimensional mean vectors and covariance matrices with application to gene-set testing. Journal of the American Statistical Association, 118(544):2548–2561.
- Power enhancement for testing multi-factor asset pricing models via fisher’s method. Journal of Econometrics, 239(2):105458.
- A small-sample multivariate kernel machine test for microbiome association studies. Genetic Epidemiology, 41(3):210–220.
- A small-sample kernel association test for correlated data with application to microbiome association studies. Genetic Epidemiology, 42(8):772–782.
- Testing in microbiome-profiling studies with mirkat, the microbiome regression-based kernel association test. The American Journal of Human Genetics, 96(5):797–807.