Taking a Moment for Distributional Robustness (2405.05461v1)
Abstract: A rich line of recent work has studied distributionally robust learning approaches that seek to learn a hypothesis that performs well, in the worst-case, on many different distributions over a population. We argue that although the most common approaches seek to minimize the worst-case loss over distributions, a more reasonable goal is to minimize the worst-case distance to the true conditional expectation of labels given each covariate. Focusing on the minmax loss objective can dramatically fail to output a solution minimizing the distance to the true conditional expectation when certain distributions contain high levels of label noise. We introduce a new min-max objective based on what is known as the adversarial moment violation and show that minimizing this objective is equivalent to minimizing the worst-case $\ell_2$-distance to the true conditional expectation if we take the adversary's strategy space to be sufficiently rich. Previous work has suggested minimizing the maximum regret over the worst-case distribution as a way to circumvent issues arising from differential noise levels. We show that in the case of square loss, minimizing the worst-case regret is also equivalent to minimizing the worst-case $\ell_2$-distance to the true conditional expectation. Although their objective and our objective both minimize the worst-case distance to the true conditional expectation, we show that our approach provides large empirical savings in computational cost in terms of the number of groups, while providing the same noise-oblivious worst-distribution guarantee as the minimax regret approach, thus making positive progress on an open question posed by Agarwal and Zhang (2022).
- Minimax regret optimization for robust machine learning under distribution shift. In Conference on Learning Theory, pages 2704–2729. PMLR, 2022.
- Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019.
- Deep generalized method of moments for instrumental variable analysis. Advances in neural information processing systems, 32, 2019.
- Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency, pages 77–91. PMLR, 2018.
- How to use expert advice. J. ACM, 44(3):427–485, may 1997.
- Robust optimization for non-convex objectives. Advances in Neural Information Processing Systems, 30, 2017.
- Distributionally robust optimization under moment uncertainty with application to data-driven problems. Operations research, 58(3):595–612, 2010.
- Lexicographically fair learning: Algorithms and generalization. In Katrina Ligett and Swati Gupta, editors, 2nd Symposium on Foundations of Responsible Computing, FORC 2021, June 9-11, 2021, Virtual Conference, volume 192 of LIPIcs, pages 6:1–6:23. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. doi: 10.4230/LIPICS.FORC.2021.6. URL https://doi.org/10.4230/LIPIcs.FORC.2021.6.
- Minimax estimation of conditional moment models. Advances in Neural Information Processing Systems, 33:12248–12262, 2020.
- Learning models with uniform performance via distributionally robust optimization. The Annals of Statistics, 49(3):1378–1406, 2021.
- Ambiguous chance constrained problems and robust optimization. Mathematical Programming, 107:37–61, 2006.
- Orthogonal statistical learning. arXiv preprint arXiv:1901.09036, 2019.
- Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29(1-2):79–103, 1999.
- Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
- On-demand sampling: Learning optimally from multiple distributions. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 406–419. Curran Associates, Inc., 2022.
- Lars Peter Hansen. Large sample properties of generalized method of moments estimators. Econometrica: Journal of the econometric society, pages 1029–1054, 1982.
- Logarithmic regret algorithms for online convex optimization. Machine Learning, 69:169–192, 2007.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Multicalibration: Calibration for the (computationally-identifiable) masses. In International Conference on Machine Learning, pages 1939–1948. PMLR, 2018.
- Kullback-leibler divergence constrained distributionally robust optimization. Available at Optimization Online, 1(2):9, 2013.
- An adversarial approach to structural estimation. Econometrica, 91(6):2041–2063, 2023.
- Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. In International conference on machine learning, pages 2564–2572. PMLR, 2018.
- Multiaccuracy: Black-box post-processing for fairness in classification. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pages 247–254, 2019.
- Wilds: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, pages 5637–5664. PMLR, 2021.
- Policy learning under biased sample selection. arXiv preprint arXiv:2304.11735, 2023.
- Adversarial generalized method of moments. arXiv preprint arXiv:1803.07164, 2018.
- Just train twice: Improving group robustness without training group information. In International Conference on Machine Learning, pages 6781–6792. PMLR, 2021.
- Large-scale celebfaces attributes (celeba) dataset. Retrieved August, 15(2018):11, 2018.
- Focus on the common good: Group distributional robustness follows. ICLR, 2022.
- The risks of invariant risk minimization. arXiv preprint arXiv:2010.05761, 2020.
- Distributionally robust neural networks. In International Conference on Learning Representations, 2019.
- A generalized representer theorem. In International conference on computational learning theory, pages 416–426. Springer, 2001.
- Distributionally robust logistic regression. Advances in Neural Information Processing Systems, 28, 2015.
- Shai Shalev-Shwartz et al. Online learning and online convex optimization. Foundations and Trends® in Machine Learning, 4(2):107–194, 2012.
- Maurice Sion. On general minimax theorems. Pacific J. Math., 8(4):171–176, 1958.
- Martin J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge University Press, 2019.
- Nyström method vs random fourier features: A theoretical and empirical comparison. In F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012.
- Doro: Distributional and outlier robust optimization. In International Conference on Machine Learning, pages 12345–12355. PMLR, 2021.