Non-Convex Robust Hypothesis Testing using Sinkhorn Uncertainty Sets (2403.14822v1)
Abstract: We present a new framework to address the non-convex robust hypothesis testing problem, wherein the goal is to seek the optimal detector that minimizes the maximum of worst-case type-I and type-II risk functions. The distributional uncertainty sets are constructed to center around the empirical distribution derived from samples based on Sinkhorn discrepancy. Given that the objective involves non-convex, non-smooth probabilistic functions that are often intractable to optimize, existing methods resort to approximations rather than exact solutions. To tackle the challenge, we introduce an exact mixed-integer exponential conic reformulation of the problem, which can be solved into a global optimum with a moderate amount of input data. Subsequently, we propose a convex approximation, demonstrating its superiority over current state-of-the-art methodologies in literature. Furthermore, we establish connections between robust hypothesis testing and regularized formulations of non-robust risk functions, offering insightful interpretations. Our numerical study highlights the satisfactory testing performance and computational efficiency of the proposed framework.
- L. Xie and Y. Xie, “Sequential change detection by optimal weighted ℓ2subscriptℓ2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT divergence,” IEEE Journal on Selected Areas in Information Theory, vol. 2, no. 2, pp. 747–761, Apr. 2021.
- L. Xie, S. Zou, Y. Xie, and V. V. Veeravalli, “Sequential (quickest) change detection: Classical results and new directions,” IEEE Journal on Selected Areas in Information Theory, vol. 2, no. 2, pp. 494–514, Apr. 2021.
- L. Xie, “Minimax robust quickest change detection using wasserstein ambiguity sets,” arXiv preprint arXiv:2204.13034, Apr. 2022.
- L. Xie, Y. Liang, and V. V. Veeravalli, “Distributionally robust quickest change detection using wasserstein uncertainty sets,” arXiv preprint arXiv:2309.16171, 2023.
- J. R. Lloyd and Z. Ghahramani, “Statistical model criticism using kernel two sample tests,” Advances in neural information processing systems, vol. 28, 2015.
- K. Chwialkowski, H. Strathmann, and A. Gretton, “A kernel test of goodness of fit,” in International conference on machine learning. PMLR, 2016, pp. 2606–2615.
- M. Bińkowski, D. J. Sutherland, M. Arbel, and A. Gretton, “Demystifying mmd gans,” arXiv preprint arXiv:1801.01401, 2018.
- P. Schober and T. R. Vetter, “Two-sample unpaired t tests in medical research,” Anesthesia & Analgesia, vol. 129, no. 4, p. 911, 2019.
- P. J. Huber, “A Robust Version of the Probability Ratio Test,” The Annals of Mathematical Statistics, vol. 36, no. 6, pp. 1753 – 1758, Dec. 1965.
- A. Magesh, Z. Sun, V. V. Veeravalli, and S. Zou, “Robust hypothesis testing with moment constrained uncertainty sets,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
- B. C. Levy, “Robust hypothesis testing with a relative entropy tolerance,” IEEE Transactions on Information Theory, vol. 55, no. 1, pp. 413–421, Jan. 2009.
- G. Gül and A. M. Zoubir, “Minimax robust hypothesis testing,” IEEE Transactions on Information Theory, vol. 63, no. 9, pp. 5572–5587, Apr. 2017.
- R. Gao, L. Xie, Y. Xie, and H. Xu, “Robust hypothesis testing using wasserstein uncertainty sets,” in Proceedings of the 32nd International Conference on Neural Information Processing Systems, Dec. 2018, p. 7913–7923.
- L. Xie, R. Gao, and Y. Xie, “Robust hypothesis testing with wasserstein uncertainty sets,” arXiv preprint arXiv:2105.14348, May 2021.
- L. Xie, “Minimax robust quickest change detection using wasserstein ambiguity sets,” in 2022 IEEE International Symposium on Information Theory (ISIT). IEEE, 2022, pp. 1909–1914.
- J. Wang and Y. Xie, “A data-driven approach to robust hypothesis testing using sinkhorn uncertainty sets,” in 2022 IEEE International Symposium on Information Theory (ISIT). IEEE, 2022, pp. 3315–3320.
- Z. Sun and S. Zou, “Kernel robust hypothesis testing,” IEEE Transactions on Information Theory, 2023.
- J. Wang, R. Gao, and Y. Xie, “Sinkhorn distributionally robust optimization,” arXiv preprint arXiv:2109.11926, Sep. 2021.
- W. Azizian, F. Iutzeler, and J. Malick, “Regularization for wasserstein distributionally robust optimization,” ESAIM: Control, Optimisation and Calculus of Variations, vol. 29, p. 33, 2023.
- J. Wang, R. Moore, Y. Xie, and R. Kamaleswaran, “Improving sepsis prediction model generalization with optimal transport,” in Machine Learning for Health. PMLR, 2022, pp. 474–488.
- S.-B. Yang and Z. Li, “Distributionally robust chance-constrained optimization with sinkhorn ambiguity set,” AIChE Journal, vol. 69, no. 10, p. e18177, 2023.
- C. Dapogny, F. Iutzeler, A. Meda, and B. Thibert, “Entropy-regularized wasserstein distributionally robust shape and topology optimization,” Structural and Multidisciplinary Optimization, vol. 66, no. 3, p. 42, 2023.
- J. Song, N. He, L. Ding, and C. Zhao, “Provably convergent policy optimization via metric-aware trust region methods,” arXiv preprint arXiv:2306.14133, 2023.
- C. Ma, S. Wojtowytsch, L. Wu et al., “Towards a mathematical understanding of neural network-based machine learning: what we know and what we don’t,” arXiv preprint arXiv:2009.10713, 2020.
- J. Dahl and E. D. Andersen, “A primal-dual interior-point algorithm for nonsymmetric exponential-cone optimization,” Mathematical Programming, vol. 194, no. 1-2, pp. 341–370, 2022.
- C. Coey, M. Lubin, and J. P. Vielma, “Outer approximation with conic certificates for mixed-integer convex problems,” Mathematical Programming Computation, vol. 12, no. 2, pp. 249–293, 2020.
- Q. Ye and W. Xie, “Second-order conic and polyhedral approximations of the exponential cone: application to mixed-integer exponential conic programs,” arXiv preprint arXiv:2106.09123, 2021.
- M. ApS, “Mosek optimization suite,” 2019.
- A. Nemirovski and A. Shapiro, “Convex approximations of chance constrained programs,” SIAM Journal on Optimization, vol. 17, no. 4, pp. 969–996, 2007.
- M. Sion, “On general minimax theorems.” Pacific J. Math., vol. 8, no. 4, pp. 171–176, 1958.
- A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, “Robust stochastic approximation approach to stochastic programming,” SIAM Journal on optimization, vol. 19, no. 4, pp. 1574–1609, 2009.
- Y. Hu, X. Chen, and N. He, “On the bias-variance-cost tradeoff of stochastic optimization,” Advances in Neural Information Processing Systems, vol. 34, pp. 22 119–22 131, 2021.
- Y. Hu, S. Zhang, X. Chen, and N. He, “Biased stochastic first-order methods for conditional stochastic optimization and applications in meta learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 2759–2770, 2020.
- Y. Hu, J. Wang, Y. Xie, A. Krause, and D. Kuhn, “Contextual stochastic bilevel optimization,” arXiv preprint arXiv:2310.18535, 2023.
- J. Blanchet and A. Shapiro, “Statistical limit theorems in distributionally robust optimization,” arXiv preprint arXiv:2303.14867, 2023.
- Z. Yang and R. Gao, “Wasserstein regularization for 0-1 loss,” Optimization Online Preprint, 2022.
- X. Cheng and Y. Xie, “Neural tangent kernel maximum mean discrepancy,” Advances in Neural Information Processing Systems, vol. 34, pp. 6658–6670, 2021.
- scikit-learn contributors, “scikit-learn: Make moons,” 2024, version 1.4.0. [Online]. Available: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_moons.html
- A. Krizhevsky, “Learning multiple layers of features from tiny images,” University of Toronto, Tech. Rep., 2009. [Online]. Available: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
- J. Johnson and S. Jernigan, “Lung cancer data set,” https://archive.ics.uci.edu/ml/datasets/lung+cancer, 1998, uCI Machine Learning Repository.
- N. Jiang and W. Xie, “Also-x#: Better convex approximations for distributionally robust chance constrained programs,” arXiv preprint arXiv:2302.01737, 2023.
- A. Shapiro, D. Dentcheva, and A. Ruszczynski, “Lectures on stochastic programming: modeling and theory,” in Society for Industrial and Applied Mathematics (SIAM), 2021.
- X. Cheng and A. Cloninger, “Classification logit two-sample testing by neural networks for differentiating near manifold densities,” IEEE Transactions on Information Theory, vol. 68, no. 10, pp. 6631–6662, 2022.