Bayesian Online Multiple Testing: A Resource Allocation Approach (2402.11425v4)
Abstract: We consider the problem of sequentially conducting multiple experiments where each experiment corresponds to a hypothesis testing task. At each time point, the experimenter must make an irrevocable decision of whether to reject the null hypothesis (or equivalently claim a discovery) before the next experimental result arrives. The goal is to maximize the number of discoveries while maintaining a low error rate at all time points measured by Local False Discovery Rate (LFDR). We formulate the problem as an online knapsack problem with exogenous random budget replenishment. We start with general arrival distributions and show that a simple policy achieves a $O(\sqrt{T})$ regret. We complement the result by showing that such regret rate is in general not improvable. We then shift our focus to discrete arrival distributions. We find that many existing re-solving heuristics in the online resource allocation literature, albeit achieve bounded loss in canonical settings, may incur a $\Omega(\sqrt{T})$ or even a $\Omega(T)$ regret. With the observation that canonical policies tend to be too optimistic and over claim discoveries, we propose a novel policy that incorporates budget safety buffers. It turns out that a little more safety can greatly enhance efficiency -- small additional logarithmic buffers suffice to reduce the regret from $\Omega(\sqrt{T})$ or even $\Omega(T)$ to $O(\ln2 T)$. From a practical perspective, we extend the policy to the scenario with continuous arrival distributions, time-dependent information structures, as well as unknown $T$. We conduct both synthetic experiments and empirical applications on a time series data from New York City taxi passengers to validate the performance of our proposed policies. Our results emphasize how effective policies should be designed in online resource allocation problems with exogenous budget replenishment.
- Generalized Ξ±πΌ\alphaitalic_Ξ±-investing: definitions, optimality results and application to public databases. Journal of the Royal Statistical Society Series B: Statistical Methodology, 76(4):771β794.
- Logarithmic regret in the dynamic and stochastic knapsack problem with equal rewards. Stochastic Systems, 10(2):170β191.
- Applied probability and queues, volumeΒ 2. Springer.
- Survey of dynamic resource-constrained reward collection problems: Unified model and analysis. Operations Research.
- sc. In Abstracts of the 2020 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems, pages 1β2.
- Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological), 57(1):289β300.
- Fluid and diffusion approximations of probabilistic matching systems. Queueing Systems, 86:1β33.
- A re-solving heuristic with uniformly bounded loss for network revenue management. Management Science, 66(7):2993β3009.
- Matching queues with reneging: a product form solution. Queueing Systems, 96(3-4):359β385.
- Revenue management of reusable resources with advanced reservations. Production and Operations Management, 26(5):836β859.
- Durrett, R. (2019). Probability: theory and examples, volumeΒ 49. Cambridge university press.
- Empirical bayes methods and false discovery rates for microarrays. Genetic epidemiology, 23(1):70β86.
- Empirical bayes analysis of a microarray experiment. Journal of the American statistical association, 96(456):1151β1160.
- Ξ±πΌ\alphaitalic_Ξ±-investing: a procedure for sequential control of expected false discoveries. Journal of the Royal Statistical Society Series B: Statistical Methodology, 70(2):429β444.
- Friend, J. (1960). Stock control with random opportunities for replenishment. Journal of the Operational Research Society, 11(3):130β136.
- A re-solving heuristic with bounded revenue loss for network revenue management with customer choice. Mathematics of Operations Research, 37(2):313β345.
- Analysis of deterministic lp-based booking limit and bid price controls for revenue management. Operations Research, 61(6):1312β1320.
- Online rules for control of false discovery rate and false discovery exceedance. The Annals of statistics, 46(2):526β554.
- Degeneracy is ok: Logarithmic regret for network revenue management with indiscrete distributions. arXiv preprint arXiv:2210.07996.
- Online resource allocation with stochastic resource consumption. arXiv preprint arXiv:2012.07933.
- Kendall, D.Β G. (1951). Some problems in the theory of queues. Journal of the Royal Statistical Society: Series B (Methodological), 13(2):151β173.
- Bayesian multi-snp genetic association analysis: Control of fdr and use of summary statistics. BioRxiv, page 316471.
- Real-time dynamic pricing for revenue management with reusable resources, advance reservation, and deterministic service time requirements. Operations Research, 68(3):676β685.
- Provably near-optimal lp-based policies for revenue management in systems with reusable resources. Operations Research, 58(2):503β507.
- Diffusion models for double-ended queues with renewal arrival processes. Stochastic Systems, 5(1):1β61.
- Lueker, G.Β S. (1998). Average-case analysis of off-line and on-line knapsack problems. Journal of Algorithms, 29(2):277β305.
- Nagaev, S. (1970). On the speed of convergence of the distribution of maximum sums of independent random variables. Theory of Probability & Its Applications, 15(2):309β314.
- Online control of the false discovery rate with decaying memory. Advances in neural information processing systems, 30.
- An asymptotically optimal policy for a quantity-based network revenue management problem. Mathematics of Operations Research, 33(2):257β282.
- Dynamic assortment optimization for reusable products with random usage durations. Management Science, 66(7):2820β2844.
- The bayesian prophet: A low-regret framework for online decision making. Management Science, 67(3):1368β1391.
- Wen, X. (2017). Robust bayesian fdr control using bayes factors, with applications to multi-tissue eqtl discovery. Statistics in Biosciences, 9:28β49.
- A framework for multi-a (rmed)/b (andit) testing with online fdr control. Advances in Neural Information Processing Systems, 30.
- Astar: Sustainable energy harvesting for the internet of things through adaptive task scheduling. ACM Transactions on Sensor Networks (TOSN), 18(1):1β34.
- Empirical bayes estimation of posterior probabilities of enrichment: a comparative study of five estimators of the local false discovery rate. BMC bioinformatics, 14(1):1β12.
- Online resource allocation for reusable resources. arXiv preprint arXiv:2212.02855.
- Assign-to-seat: Dynamic capacity control for selling high-speed train tickets. Manufacturing & Service Operations Management, 25(3):921β938.
- The power of batching in multiple hypothesis testing. In International Conference on Artificial Intelligence and Statistics, pages 3806β3815. PMLR.
- Asynchronous online testing of multiple hypotheses. The Journal of Machine Learning Research, 22(1):1585β1623.