Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Non-Convex Robust Hypothesis Testing using Sinkhorn Uncertainty Sets (2403.14822v1)

Published 21 Mar 2024 in stat.ML, cs.LG, and math.OC

Abstract: We present a new framework to address the non-convex robust hypothesis testing problem, wherein the goal is to seek the optimal detector that minimizes the maximum of worst-case type-I and type-II risk functions. The distributional uncertainty sets are constructed to center around the empirical distribution derived from samples based on Sinkhorn discrepancy. Given that the objective involves non-convex, non-smooth probabilistic functions that are often intractable to optimize, existing methods resort to approximations rather than exact solutions. To tackle the challenge, we introduce an exact mixed-integer exponential conic reformulation of the problem, which can be solved into a global optimum with a moderate amount of input data. Subsequently, we propose a convex approximation, demonstrating its superiority over current state-of-the-art methodologies in literature. Furthermore, we establish connections between robust hypothesis testing and regularized formulations of non-robust risk functions, offering insightful interpretations. Our numerical study highlights the satisfactory testing performance and computational efficiency of the proposed framework.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. L. Xie and Y. Xie, “Sequential change detection by optimal weighted ℓ2subscriptℓ2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT divergence,” IEEE Journal on Selected Areas in Information Theory, vol. 2, no. 2, pp. 747–761, Apr. 2021.
  2. L. Xie, S. Zou, Y. Xie, and V. V. Veeravalli, “Sequential (quickest) change detection: Classical results and new directions,” IEEE Journal on Selected Areas in Information Theory, vol. 2, no. 2, pp. 494–514, Apr. 2021.
  3. L. Xie, “Minimax robust quickest change detection using wasserstein ambiguity sets,” arXiv preprint arXiv:2204.13034, Apr. 2022.
  4. L. Xie, Y. Liang, and V. V. Veeravalli, “Distributionally robust quickest change detection using wasserstein uncertainty sets,” arXiv preprint arXiv:2309.16171, 2023.
  5. J. R. Lloyd and Z. Ghahramani, “Statistical model criticism using kernel two sample tests,” Advances in neural information processing systems, vol. 28, 2015.
  6. K. Chwialkowski, H. Strathmann, and A. Gretton, “A kernel test of goodness of fit,” in International conference on machine learning.   PMLR, 2016, pp. 2606–2615.
  7. M. Bińkowski, D. J. Sutherland, M. Arbel, and A. Gretton, “Demystifying mmd gans,” arXiv preprint arXiv:1801.01401, 2018.
  8. P. Schober and T. R. Vetter, “Two-sample unpaired t tests in medical research,” Anesthesia & Analgesia, vol. 129, no. 4, p. 911, 2019.
  9. P. J. Huber, “A Robust Version of the Probability Ratio Test,” The Annals of Mathematical Statistics, vol. 36, no. 6, pp. 1753 – 1758, Dec. 1965.
  10. A. Magesh, Z. Sun, V. V. Veeravalli, and S. Zou, “Robust hypothesis testing with moment constrained uncertainty sets,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2023, pp. 1–5.
  11. B. C. Levy, “Robust hypothesis testing with a relative entropy tolerance,” IEEE Transactions on Information Theory, vol. 55, no. 1, pp. 413–421, Jan. 2009.
  12. G. Gül and A. M. Zoubir, “Minimax robust hypothesis testing,” IEEE Transactions on Information Theory, vol. 63, no. 9, pp. 5572–5587, Apr. 2017.
  13. R. Gao, L. Xie, Y. Xie, and H. Xu, “Robust hypothesis testing using wasserstein uncertainty sets,” in Proceedings of the 32nd International Conference on Neural Information Processing Systems, Dec. 2018, p. 7913–7923.
  14. L. Xie, R. Gao, and Y. Xie, “Robust hypothesis testing with wasserstein uncertainty sets,” arXiv preprint arXiv:2105.14348, May 2021.
  15. L. Xie, “Minimax robust quickest change detection using wasserstein ambiguity sets,” in 2022 IEEE International Symposium on Information Theory (ISIT).   IEEE, 2022, pp. 1909–1914.
  16. J. Wang and Y. Xie, “A data-driven approach to robust hypothesis testing using sinkhorn uncertainty sets,” in 2022 IEEE International Symposium on Information Theory (ISIT).   IEEE, 2022, pp. 3315–3320.
  17. Z. Sun and S. Zou, “Kernel robust hypothesis testing,” IEEE Transactions on Information Theory, 2023.
  18. J. Wang, R. Gao, and Y. Xie, “Sinkhorn distributionally robust optimization,” arXiv preprint arXiv:2109.11926, Sep. 2021.
  19. W. Azizian, F. Iutzeler, and J. Malick, “Regularization for wasserstein distributionally robust optimization,” ESAIM: Control, Optimisation and Calculus of Variations, vol. 29, p. 33, 2023.
  20. J. Wang, R. Moore, Y. Xie, and R. Kamaleswaran, “Improving sepsis prediction model generalization with optimal transport,” in Machine Learning for Health.   PMLR, 2022, pp. 474–488.
  21. S.-B. Yang and Z. Li, “Distributionally robust chance-constrained optimization with sinkhorn ambiguity set,” AIChE Journal, vol. 69, no. 10, p. e18177, 2023.
  22. C. Dapogny, F. Iutzeler, A. Meda, and B. Thibert, “Entropy-regularized wasserstein distributionally robust shape and topology optimization,” Structural and Multidisciplinary Optimization, vol. 66, no. 3, p. 42, 2023.
  23. J. Song, N. He, L. Ding, and C. Zhao, “Provably convergent policy optimization via metric-aware trust region methods,” arXiv preprint arXiv:2306.14133, 2023.
  24. C. Ma, S. Wojtowytsch, L. Wu et al., “Towards a mathematical understanding of neural network-based machine learning: what we know and what we don’t,” arXiv preprint arXiv:2009.10713, 2020.
  25. J. Dahl and E. D. Andersen, “A primal-dual interior-point algorithm for nonsymmetric exponential-cone optimization,” Mathematical Programming, vol. 194, no. 1-2, pp. 341–370, 2022.
  26. C. Coey, M. Lubin, and J. P. Vielma, “Outer approximation with conic certificates for mixed-integer convex problems,” Mathematical Programming Computation, vol. 12, no. 2, pp. 249–293, 2020.
  27. Q. Ye and W. Xie, “Second-order conic and polyhedral approximations of the exponential cone: application to mixed-integer exponential conic programs,” arXiv preprint arXiv:2106.09123, 2021.
  28. M. ApS, “Mosek optimization suite,” 2019.
  29. A. Nemirovski and A. Shapiro, “Convex approximations of chance constrained programs,” SIAM Journal on Optimization, vol. 17, no. 4, pp. 969–996, 2007.
  30. M. Sion, “On general minimax theorems.” Pacific J. Math., vol. 8, no. 4, pp. 171–176, 1958.
  31. A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, “Robust stochastic approximation approach to stochastic programming,” SIAM Journal on optimization, vol. 19, no. 4, pp. 1574–1609, 2009.
  32. Y. Hu, X. Chen, and N. He, “On the bias-variance-cost tradeoff of stochastic optimization,” Advances in Neural Information Processing Systems, vol. 34, pp. 22 119–22 131, 2021.
  33. Y. Hu, S. Zhang, X. Chen, and N. He, “Biased stochastic first-order methods for conditional stochastic optimization and applications in meta learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 2759–2770, 2020.
  34. Y. Hu, J. Wang, Y. Xie, A. Krause, and D. Kuhn, “Contextual stochastic bilevel optimization,” arXiv preprint arXiv:2310.18535, 2023.
  35. J. Blanchet and A. Shapiro, “Statistical limit theorems in distributionally robust optimization,” arXiv preprint arXiv:2303.14867, 2023.
  36. Z. Yang and R. Gao, “Wasserstein regularization for 0-1 loss,” Optimization Online Preprint, 2022.
  37. X. Cheng and Y. Xie, “Neural tangent kernel maximum mean discrepancy,” Advances in Neural Information Processing Systems, vol. 34, pp. 6658–6670, 2021.
  38. scikit-learn contributors, “scikit-learn: Make moons,” 2024, version 1.4.0. [Online]. Available: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_moons.html
  39. A. Krizhevsky, “Learning multiple layers of features from tiny images,” University of Toronto, Tech. Rep., 2009. [Online]. Available: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf
  40. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
  41. J. Johnson and S. Jernigan, “Lung cancer data set,” https://archive.ics.uci.edu/ml/datasets/lung+cancer, 1998, uCI Machine Learning Repository.
  42. N. Jiang and W. Xie, “Also-x#: Better convex approximations for distributionally robust chance constrained programs,” arXiv preprint arXiv:2302.01737, 2023.
  43. A. Shapiro, D. Dentcheva, and A. Ruszczynski, “Lectures on stochastic programming: modeling and theory,” in Society for Industrial and Applied Mathematics (SIAM), 2021.
  44. X. Cheng and A. Cloninger, “Classification logit two-sample testing by neural networks for differentiating near manifold densities,” IEEE Transactions on Information Theory, vol. 68, no. 10, pp. 6631–6662, 2022.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com