Optimal allocation of Monte Carlo simulations to multiple hypothesis tests

Published 27 Feb 2015 in stat.CO | (1502.07864v5)

Abstract: Multiple hypothesis tests are often carried out in practice using p-value estimates obtained with bootstrap or permutation tests since the analytical p-values underlying all hypotheses are usually unknown. This article considers the allocation of a pre-specified total number of Monte Carlo simulations $K \in \mathbb{N}$ (i.e., permutations or draws from a bootstrap distribution) to a given number of $m \in \mathbb{N}$ hypotheses in order to approximate their p-values $p \in [0,1]^m$ in an optimal way, in the sense that the allocation minimises the total expected number of misclassified hypotheses. A misclassification occurs if a decision on a single hypothesis, obtained with an approximated p-value, differs from the one obtained if its p-value was known analytically. The contribution of this article is threefold: Under the assumption that $p$ is known and $K \in \mathbb{R}$, and using a normal approximation of the Binomial distribution, the optimal real-valued allocation of $K$ simulations to $m$ hypotheses is derived when correcting for multiplicity with the Bonferroni correction, both when computing the p-value estimates with or without a pseudo-count. Computational subtleties arising in the former case will be discussed. Second, with the help of an algorithm based on simulated annealing, empirical evidence is given that the optimal integer allocation is likely of the same form as the optimal real-valued allocation, and that both seem to coincide asympotically. Third, an empirical study on simulated and real data demonstrates that a recently proposed sampling algorithm based on Thompson sampling asympotically mimics the optimal (real-valued) allocation when the p-values are unknown and thus estimated at runtime.