Optimal Covariate Weighting Increases Discoveries in High-throughput Biology

Published 11 Mar 2022 in stat.ME | (2203.05926v1)

Abstract: The large-scale multiple testing inherent to high throughput biological data necessitates very high statistical stringency and thus true effects in data are difficult to detect unless they have high effect sizes. One promising approach for reducing the multiple testing burden is to use independent information to prioritize the features most likely to be true effects. However, using the independent data effectively is challenging and often does not lead to substantial gains in power. Current state-of-the-art methods sort features into groups by the independent information and calculate weights for each group. However, when true effects are weak and rare (the typical situation for high throughput biological studies), all groups will contain many null tests and thus their weights are diluted, and performance suffers. We introduce Covariate Rank Weighting (CRW), a method for calculating approximate optimal weights conditioned on the ranking of tests by an external covariate. This approach uses the probabilistic relationship between covariate ranking and test effect size to calculate individual weights for each test that are more informative than group weights and are not diluted by null effects. We show how this relationship can be calculated theoretically for normally distributed covariates. It can be estimated empirically in other cases. We show via simulations and applications to data that this method outperforms existing methods by as much as 10-fold in the rare/low effect size scenario common to biological data and has at least comparable performance in all scenarios.