Exponentially Consistent Statistical Classification of Continuous Sequences with Distribution Uncertainty
Abstract: In multiple classification, one aims to determine whether a testing sequence is generated from the same distribution as one of the M training sequences or not. Unlike most of existing studies that focus on discrete-valued sequences with perfect distribution match, we study multiple classification for continuous sequences with distribution uncertainty, where the generating distributions of the testing and training sequences deviate even under the true hypothesis. In particular, we propose distribution free tests and prove that the error probabilities of our tests decay exponentially fast for three different test designs: fixed-length, sequential, and two-phase tests. We first consider the simple case without the null hypothesis, where the testing sequence is known to be generated from a distribution close to the generating distribution of one of the training sequences. Subsequently, we generalize our results to a more general case with the null hypothesis by allowing the testing sequence to be generated from a distribution that is vastly different from the generating distributions of all training sequences.
- H. Chernoff, “A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations,” The Annals of Mathematical Statistics, vol. 23, pp. 493–507, 1952.
- R. E. Blahut, “Hypothesis testing and information theory,” IEEE Trans. Inf. Theory, vol. 20, pp. 405–417, 1974.
- A. Wald and J. Wolfowitz, “Optimum character of the sequential probability ratio test,” The Annals of Mathematical Statistics, pp. 326–339, 1948.
- A. Lalitha and T. Javidi, “On error exponents of almost-fixed-length channel codes and hypothesis tests,” arXiv:2012.00077, 2020.
- J. Ziv, “On classification with empirically observed statistics and universal data compression,” IEEE Trans. Inf. Theory, vol. 34, no. 2, pp. 278–286, 1988.
- M. Gutman, “Asymptotically optimal classification for multiple tests with empirically observed statistics,” IEEE Trans. Inf. Theory, vol. 35, no. 2, pp. 401–408, 1989.
- M. Haghifam, V. Y. Tan, and A. Khisti, “Sequential classification with empirically observed statistics,” IEEE Trans. Inf. Theory, vol. 67, no. 5, pp. 3095–3113, 2021.
- C.-Y. Hsu, C.-F. Li, and I.-H. Wang, “On universal sequential classification from sequentially observed empirical statistics,” in IEEE ITW, 2022, pp. 642–647.
- L. Bai, J. Diao, and L. Zhou, “Achievable error exponents for almost fixed-length binary classification,” in IEEE ISIT, 2022, pp. 1336–1341.
- J. Diao, L. Zhou, and L. Bai, “Achievable error exponents for almost fixed-length M-ary classification,” in IEEE ISIT, 2023, pp. 1568–1573.
- H.-W. Hsu and I.-H. Wang, “On binary statistical classification from mismatched empirically observed statistics,” in IEEE ISIT, 2020, pp. 2533–2538.
- Q. Li, T. Wang, D. J. Bucci, Y. Liang, B. Chen, and P. K. Varshney, “Nonparametric composite hypothesis testing in an asymptotic regime,” IEEE J. Sel. Top. Signal Process., vol. 12, no. 5, pp. 1005–1014, 2018.
- S. Zou, Y. Liang, H. V. Poor, and X. Shi, “Nonparametric detection of anomalous data streams,” IEEE Trans. Signal Process., vol. 65, no. 21, pp. 5785–5797, 2017.
- A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola, “A kernel two-sample test,” J. Mach. Learn. Res., vol. 13, no. 1, pp. 723–773, 2012.
- Z. Sun and S. Zou, “Kernel robust hypothesis testing,” IEEE Trans. Inf. Theory, 2023.
- C. McDiarmid et al., “On the method of bounded differences,” Surveys in combinatorics, vol. 141, no. 1, pp. 148–188, 1989.
- L. Zhou, V. Y. Tan, and M. Motani, “Second-order asymptotically optimal statistical classification,” Information and Inference: A Journal of the IMA, vol. 9, no. 1, pp. 81–111, 2020.
- Y. Bu, S. Zou, and V. V. Veeravalli, “Linear-complexity exponentially-consistent tests for universal outlying sequence detection,” IEEE Trans. Signal Process., vol. 67, no. 8, pp. 2115–2128, 2019.
- J. Pan, Y. Li, and V. Y. Tan, “Asymptotics of sequential composite hypothesis testing under probabilistic constraints,” IEEE Trans. Inf. Theory, vol. 68, no. 8, pp. 4998–5012, 2022.
- R. R. Tenney and N. R. Sandell, “Detection with distributed sensors,” IEEE Trans. Aerosp. Electron. Syst., no. 4, pp. 501–510, 1981.
- J. N. Tsitsiklis, “Decentralized detection by a large number of sensors,” Math. Control Signals Syst., vol. 1, no. 2, pp. 167–182, 1988.
- L. Kaufman and P. J. Rousseeuw, “Finding groups in data. an introduction to cluster analysis,” Wiley Series in Probability and Mathematical Statistics. Applied Probability and Statistics, 1990.
- H.-S. Park and C.-H. Jun, “A simple and fast algorithm for k-medoids clustering,” Expert Syst. Appl, vol. 36, no. 2, pp. 3336–3341, 2009.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.