The Sample Complexity of Simple Binary Hypothesis Testing (2403.16981v2)
Abstract: The sample complexity of simple binary hypothesis testing is the smallest number of i.i.d.\ samples required to distinguish between two distributions $p$ and $q$ in either: (i) the prior-free setting, with type-I error at most $\alpha$ and type-II error at most $\beta$; or (ii) the Bayesian setting, with Bayes error at most $\delta$ and prior distribution $(\pi, 1-\pi)$. This problem has only been studied when $\alpha = \beta$ (prior-free) or $\pi = 1/2$ (Bayesian), and the sample complexity is known to be characterized by the Hellinger divergence between $p$ and $q$, up to multiplicative constants. In this paper, we derive a formula that characterizes the sample complexity (up to multiplicative constants that are independent of $p$, $q$, and all error parameters) for: (i) all $0 \le \alpha, \beta \le 1/8$ in the prior-free setting; and (ii) all $\delta \le \pi/4$ in the Bayesian setting. In particular, the formula admits equivalent expressions in terms of certain divergences from the Jensen--Shannon and Hellinger families. The main technical result concerns an $f$-divergence inequality between members of the Jensen--Shannon and Hellinger families, which is proved by a combination of information-theoretic tools and case-by-case analyses. We explore applications of our results to (i) robust hypothesis testing, (ii) distributed (locally-private and communication-constrained) hypothesis testing, (iii) sequential hypothesis testing, and (iv) hypothesis testing with erasures.
- M. Aliakbarpour, M. Bun and A. Smith “Hypothesis Selection with Memory Constraints” In Advances in Neural Information Processing Systems 36 (NeurIPS), 2023
- J. Acharya, C.L. Canonne and H. Tyagi “Inference Under Information Constraints I: Lower Bounds From Chi-Square Contraction” In IEEE Transactions on Information Theory 66.12, 2020 DOI: 10.1109/TIT.2020.3028440
- J. Acharya, C.L. Canonne and H. Tyagi “Inference Under Information Constraints II: Communication Constraints and Shared Randomness” In IEEE Transactions on Information Theory 66.12, 2020 DOI: 10.1109/TIT.2020.3028439
- Z. Bar-Yossef “The complexity of massive data set computations”, 2002
- “Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data Processing Inequality” In Proc. 49th Annual ACM Symposium on Theory of Computing (STOC), 2016 DOI: 10.1145/2897518.2897582
- “Private Hypothesis Selection” In Advances in Neural Information Processing Systems 32 (NeurIPS), 2019
- “Information-Distilling Quantizers” In IEEE Transactions on Information Theory 67.4, 2021, pp. 2472–2487
- C.L. Canonne “Topics and Techniques in Distribution Testing: A Biased but Representative Sample” In Foundations and Trends® in Communications and Information Theory 19.6, 2022, pp. 1032–1198
- “The Structure of Optimal Private Tests for Simple Hypotheses” In Proc. 51st Annual ACM Symposium on Theory of Computing (STOC), 2019 DOI: 10.1145/3313276.3316336
- W.-N. Chen, P. Kairouz and A. Özgür “Breaking the Communication-Privacy-Accuracy Trilemma” In IEEE Transactions on Information Theory, 2023 DOI: 10.1109/TIT.2022.3218772
- “Elements of Information Theory” USA: Wiley-Interscience, 2006
- “Geometrizing rates of convergence, II” In The Annals of Statistics JSTOR, 1991, pp. 633–667
- L. Devroye, A. Mehrabian and T. Reddad “The total variation distance between high-dimensional Gaussians with the same mean” In arXiv preprint arXiv:1810.08693, 2018
- “Locally Private Hypothesis Selection” In Proc. 33rd Annual Conference on Learning Theory (COLT), 2020
- O. Goldreich “Introduction to property testing” Cambridge University Press, 2017
- “Probability of error, equivocation, and the Chernoff bound” In IEEE Transactions on Information Theory 16.4, 1970, pp. 368–372 DOI: 10.1109/TIT.1970.1054466
- “Minimax Tests and the Neyman-Pearson Lemma for Capacities” In The Annals of Statistics 1.2, 1973 DOI: 10.1214/aos/1176342363
- P.J. Huber “A Robust Version of the Probability Ratio Test” In The Annals of Mathematical Statistics 36.6, 1965, pp. 1753–1758 DOI: 10.1214/aoms/1177699803
- “On Pairs of f𝑓fitalic_f -Divergences and Their Joint Range” In IEEE Transactions on Information Theory 57.6, 2011, pp. 3230–3235 DOI: 10.1109/TIT.2011.2137353
- L. Le Cam “Convergence of estimates under dimensionality restrictions” In The Annals of Statistics JSTOR, 1973, pp. 38–53
- J. Lin “Divergence Measures Based on the Shannon Entropy” In IEEE Transactions on Information Theory 37.1, 1991, pp. 145–151 DOI: 10.1109/18.61115
- E.L. Lehmann, J.P. Romano and G. Casella “Testing statistical hypotheses” Springer, 1986
- F. Nielsen “A Family of Statistical Symmetric Divergences Based on Jensen’s Inequality” In arXiv:1009.4004, 2011 DOI: 10.48550/arXiv.1009.4004
- F. Nielsen “On a generalization of the Jensen–Shannon divergence and the Jensen–Shannon centroid” In Entropy 22.2 MDPI, 2020, pp. 221
- “On the Problem of the Most Efficient Tests of Statistical Hypotheses” In Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character 231, 1933, pp. 289–337
- “Simple Binary Hypothesis Testing under Local Differential Privacy and Communication Constraints” In Proc. 36th Annual Conference on Learning Theory (COLT), 2023
- A. Pensia, V. Jog and P. Loh “Communication-constrained hypothesis testing: Optimality, robustness, and reverse data processing inequalities” In IEEE Transactions on Information Theory, 2024
- Y. Polyanskiy, H.V. Poor and S. Verdú “Channel coding rate in the finite blocklength regime” In IEEE Transactions on Information Theory 56.5 IEEE, 2010, pp. 2307–2359
- “Information Theory: From Coding to Learning,” Cambridge University Press, 2023 URL: https://people.lids.mit.edu/yp/homepage/data/itbook-export.pdf
- I. Sason “On f𝑓fitalic_f-Divergences: Integral Representations, Local Behavior, and Inequalities” In Entropy 20.5, 2018, pp. 383 DOI: 10.3390/e20050383
- V. Strassen “Asymptotische abschatzugen in Shannon’s informationstheorie” In Transactions of the Third Prague Conference on Information Theory etc, 1962. Czechoslovak Academy of Sciences, Prague, 1962, pp. 689–723
- “f𝑓fitalic_f -Divergence Inequalities” In IEEE Transactions on Information Theory 62.11, 2016 DOI: 10.1109/TIT.2016.2603151
- J.N. Tsitsiklis “Decentralized Detection” In Advances in Statistical Signal Processing, 1993, pp. 297–344
- R. Vershynin “High-Dimensional Probability: An Introduction with Applications in Data Science” Cambridge University Press, 2018