Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Sample Complexity of Simple Binary Hypothesis Testing (2403.16981v2)

Published 25 Mar 2024 in math.ST, cs.IT, math.IT, stat.ML, and stat.TH

Abstract: The sample complexity of simple binary hypothesis testing is the smallest number of i.i.d.\ samples required to distinguish between two distributions $p$ and $q$ in either: (i) the prior-free setting, with type-I error at most $\alpha$ and type-II error at most $\beta$; or (ii) the Bayesian setting, with Bayes error at most $\delta$ and prior distribution $(\pi, 1-\pi)$. This problem has only been studied when $\alpha = \beta$ (prior-free) or $\pi = 1/2$ (Bayesian), and the sample complexity is known to be characterized by the Hellinger divergence between $p$ and $q$, up to multiplicative constants. In this paper, we derive a formula that characterizes the sample complexity (up to multiplicative constants that are independent of $p$, $q$, and all error parameters) for: (i) all $0 \le \alpha, \beta \le 1/8$ in the prior-free setting; and (ii) all $\delta \le \pi/4$ in the Bayesian setting. In particular, the formula admits equivalent expressions in terms of certain divergences from the Jensen--Shannon and Hellinger families. The main technical result concerns an $f$-divergence inequality between members of the Jensen--Shannon and Hellinger families, which is proved by a combination of information-theoretic tools and case-by-case analyses. We explore applications of our results to (i) robust hypothesis testing, (ii) distributed (locally-private and communication-constrained) hypothesis testing, (iii) sequential hypothesis testing, and (iv) hypothesis testing with erasures.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. M. Aliakbarpour, M. Bun and A. Smith “Hypothesis Selection with Memory Constraints” In Advances in Neural Information Processing Systems 36 (NeurIPS), 2023
  2. J. Acharya, C.L. Canonne and H. Tyagi “Inference Under Information Constraints I: Lower Bounds From Chi-Square Contraction” In IEEE Transactions on Information Theory 66.12, 2020 DOI: 10.1109/TIT.2020.3028440
  3. J. Acharya, C.L. Canonne and H. Tyagi “Inference Under Information Constraints II: Communication Constraints and Shared Randomness” In IEEE Transactions on Information Theory 66.12, 2020 DOI: 10.1109/TIT.2020.3028439
  4. Z. Bar-Yossef “The complexity of massive data set computations”, 2002
  5. “Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data Processing Inequality” In Proc. 49th Annual ACM Symposium on Theory of Computing (STOC), 2016 DOI: 10.1145/2897518.2897582
  6. “Private Hypothesis Selection” In Advances in Neural Information Processing Systems 32 (NeurIPS), 2019
  7. “Information-Distilling Quantizers” In IEEE Transactions on Information Theory 67.4, 2021, pp. 2472–2487
  8. C.L. Canonne “Topics and Techniques in Distribution Testing: A Biased but Representative Sample” In Foundations and Trends® in Communications and Information Theory 19.6, 2022, pp. 1032–1198
  9. “The Structure of Optimal Private Tests for Simple Hypotheses” In Proc. 51st Annual ACM Symposium on Theory of Computing (STOC), 2019 DOI: 10.1145/3313276.3316336
  10. W.-N. Chen, P. Kairouz and A. Özgür “Breaking the Communication-Privacy-Accuracy Trilemma” In IEEE Transactions on Information Theory, 2023 DOI: 10.1109/TIT.2022.3218772
  11. “Elements of Information Theory” USA: Wiley-Interscience, 2006
  12. “Geometrizing rates of convergence, II” In The Annals of Statistics JSTOR, 1991, pp. 633–667
  13. L. Devroye, A. Mehrabian and T. Reddad “The total variation distance between high-dimensional Gaussians with the same mean” In arXiv preprint arXiv:1810.08693, 2018
  14. “Locally Private Hypothesis Selection” In Proc. 33rd Annual Conference on Learning Theory (COLT), 2020
  15. O. Goldreich “Introduction to property testing” Cambridge University Press, 2017
  16. “Probability of error, equivocation, and the Chernoff bound” In IEEE Transactions on Information Theory 16.4, 1970, pp. 368–372 DOI: 10.1109/TIT.1970.1054466
  17. “Minimax Tests and the Neyman-Pearson Lemma for Capacities” In The Annals of Statistics 1.2, 1973 DOI: 10.1214/aos/1176342363
  18. P.J. Huber “A Robust Version of the Probability Ratio Test” In The Annals of Mathematical Statistics 36.6, 1965, pp. 1753–1758 DOI: 10.1214/aoms/1177699803
  19. “On Pairs of f𝑓fitalic_f -Divergences and Their Joint Range” In IEEE Transactions on Information Theory 57.6, 2011, pp. 3230–3235 DOI: 10.1109/TIT.2011.2137353
  20. L. Le Cam “Convergence of estimates under dimensionality restrictions” In The Annals of Statistics JSTOR, 1973, pp. 38–53
  21. J. Lin “Divergence Measures Based on the Shannon Entropy” In IEEE Transactions on Information Theory 37.1, 1991, pp. 145–151 DOI: 10.1109/18.61115
  22. E.L. Lehmann, J.P. Romano and G. Casella “Testing statistical hypotheses” Springer, 1986
  23. F. Nielsen “A Family of Statistical Symmetric Divergences Based on Jensen’s Inequality” In arXiv:1009.4004, 2011 DOI: 10.48550/arXiv.1009.4004
  24. F. Nielsen “On a generalization of the Jensen–Shannon divergence and the Jensen–Shannon centroid” In Entropy 22.2 MDPI, 2020, pp. 221
  25. “On the Problem of the Most Efficient Tests of Statistical Hypotheses” In Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character 231, 1933, pp. 289–337
  26. “Simple Binary Hypothesis Testing under Local Differential Privacy and Communication Constraints” In Proc. 36th Annual Conference on Learning Theory (COLT), 2023
  27. A. Pensia, V. Jog and P. Loh “Communication-constrained hypothesis testing: Optimality, robustness, and reverse data processing inequalities” In IEEE Transactions on Information Theory, 2024
  28. Y. Polyanskiy, H.V. Poor and S. Verdú “Channel coding rate in the finite blocklength regime” In IEEE Transactions on Information Theory 56.5 IEEE, 2010, pp. 2307–2359
  29. “Information Theory: From Coding to Learning,” Cambridge University Press, 2023 URL: https://people.lids.mit.edu/yp/homepage/data/itbook-export.pdf
  30. I. Sason “On f𝑓fitalic_f-Divergences: Integral Representations, Local Behavior, and Inequalities” In Entropy 20.5, 2018, pp. 383 DOI: 10.3390/e20050383
  31. V. Strassen “Asymptotische abschatzugen in Shannon’s informationstheorie” In Transactions of the Third Prague Conference on Information Theory etc, 1962. Czechoslovak Academy of Sciences, Prague, 1962, pp. 689–723
  32. “f𝑓fitalic_f -Divergence Inequalities” In IEEE Transactions on Information Theory 62.11, 2016 DOI: 10.1109/TIT.2016.2603151
  33. J.N. Tsitsiklis “Decentralized Detection” In Advances in Statistical Signal Processing, 1993, pp. 297–344
  34. R. Vershynin “High-Dimensional Probability: An Introduction with Applications in Data Science” Cambridge University Press, 2018
Citations (3)

Summary

  • The paper introduces a unified sample complexity formula that quantifies the minimum samples needed for binary hypothesis testing in both Bayesian and prior-free settings.
  • It details a methodology that segments error probability regimes—linear, sublinear, and polynomial—employing metrics like mutual information and f-divergences.
  • The analysis extends to distributed and robust testing, presenting efficient algorithms under communication constraints and exploring the weak detection regime.

Sample Complexity of Simple Binary Hypothesis Testing

Introduction

The area of statistical hypothesis testing, a cornerstone of statistical inference, primarily focuses on the challenge of deciding between two competing hypotheses based on observed data. The simplest and one of the most fundamental forms of this problem is the simple binary hypothesis testing, which involves distinguishing between two specific distributions, pp and qq, based on a set of observations. While the classical Neyman-Pearson lemma provides an optimal procedure for this task, understanding the non-asymptotic sample complexity, i.e., the minimum number of samples required to make a decision at a given error rate, presents complex challenges.

Sample Complexity in Bayesian and Prior-free Settings

For a rigorous exploration, we consider both Bayesian and prior-free settings under the assumption that the Hellinger divergence between pp and qq is no greater than $0.125$. We deliver a comprehensive formula characterizing the sample complexity up to multiplicative constants for:

  1. All 0α,β1/80 \le \alpha, \beta \le 1/8 in the prior-free setting.
  2. All δα/4\delta \le \alpha/4 in the Bayesian setting.

This formula, surprisingly, admits equivalent expressions, indicating a versatile application across divergences from both the Jensen-Shannon and Hellinger families.

Results in Bayesian Hypothesis Testing

The Bayesian sample complexity, denoted as nB(p,q,α,δ)n_B(p,q,\alpha,\delta), specifies the smallest number of i.i.d. samples essential to distinguish between pp and qq with a Bayes error at most δ\delta, considering a prior distribution (α,1α)(\alpha, 1-\alpha). Our analysis segments the sample complexity question into three prime regimes based on the ratio of δ\delta to α\alpha:

  • For a linear error probability, characterized by δ\delta being a small but constant fraction of α\alpha, the sample complexity is inversely proportional to both the mutual information I(Θ;X1)I(\Theta;X_1) and an ff-divergence Hλ(p,q)H_\lambda(p,q).
  • When δ\delta presents a sublinear proportion in α\alpha, a reduction-based perspective reveals that solving BB(p,q,α,δ)B_B(p,q,\alpha,\delta) equates to solving multiple instances of BB(p,q,α,δ)B_B(p,q,\alpha',\delta') with modified error probabilities, employing a median-based outcome boost.
  • The third segment addresses the polynomial error probability regime where δα2\delta \leq \alpha^2, thereby extending known results on asymptotic sample complexity.

Distributed and Robust Hypothesis Testing

Leveraging the derived sample complexity formula, we also investigate hypothesis testing under communication constraints and local differential privacy. We present algorithms exhibiting statistical and computational efficiency, demonstrating the profound impact of our foundational sample complexity analysis on distributed statistical inference paradigms.

Weak Detection Regime

A notable dive into the weak detection regime, where error probabilities approach the priors, unveils that standard divergences like the Hellinger divergence do not fully characterize the sample complexity. The weak detection analysis culminates in intriguing observations about the possible independence of sample complexity from minor perturbations in error probabilities.

Conclusion

The meticulous characterization of sample complexity for simple binary hypothesis testing presents profound theoretical and practical implications, offering insights into the minimal sample requirements across various settings. Our findings unlock new pathways for future explorations in statistical hypothesis testing, encouraging in-depth examinations of weak detection regimes and further application of our formula in distributed hypothesis testing scenarios.