Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond (2302.09267v5)
Abstract: This paper investigates group distributionally robust optimization (GDRO) with the goal of learning a model that performs well over $m$ different distributions. First, we formulate GDRO as a stochastic convex-concave saddle-point problem, which is then solved by stochastic mirror descent (SMD) with $m$ samples in each iteration, and attain a nearly optimal sample complexity. To reduce the number of samples required in each round from $m$ to 1, we cast GDRO as a two-player game, where one player conducts SMD and the other executes an online algorithm for non-oblivious multi-armed bandits, maintaining the same sample complexity. Next, we extend GDRO to address scenarios involving imbalanced data and heterogeneous distributions. In the first scenario, we introduce a weighted variant of GDRO, enabling distribution-dependent convergence rates that rely on the number of samples from each distribution. We design two strategies to meet the sample budget: one integrates non-uniform sampling into SMD, and the other employs the stochastic mirror-prox algorithm with mini-batches, both of which deliver faster rates for distributions with more samples. In the second scenario, we propose to optimize the average top-$k$ risk instead of the maximum risk, thereby mitigating the impact of outlier distributions. Similar to the case of vanilla GDRO, we develop two stochastic approaches: one uses $m$ samples per iteration via SMD, and the other consumes $k$ samples per iteration through an online algorithm for non-oblivious combinatorial semi-bandits.
- Minimax regret optimization for robust machine learning under distribution shift. In Proceedings of 35th Conference on Learning Theory, pages 2704–2729, 2022.
- Deep speech 2 : End-to-end speech recognition in english and mandarin. In Proceedings of the 33rd International Conference on Machine Learning, pages 173–182, 2016.
- Regret bounds and minimax policies under partial monitoring. Journal of Machine Learning Research, 11:2785–2836, 2010.
- The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):48–77, 2002.
- Adult. UCI Machine Learning Repository, 1996. DOI: https://doi.org/10.24432/C5XW20.
- Robust Optimization. Princeton University Press, 2009.
- Robust solutions of optimization problems affected by uncertain probabilities. Management Science, 59(2):341–357, 2013.
- Oracle-based robust optimization via online learning. Operations Research, 63(3):628–638, 2015.
- Robust sample average approximation. Mathematical Programming, 171:217–282, 2018.
- Collaborative PAC learning. In Advances in Neural Information Processing Systems 30, pages 2389–2398, 2017.
- Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends in Machine Learning, 5(1):1–122, 2012.
- Distributionally robust optimization via ball oracle acceleration. In Advances in Neural Information Processing Systems 35, pages 35866–35879, 2022.
- Prediction, Learning, and Games. Cambridge University Press, 2006.
- Better mini-batch algorithms via accelerated gradient methods. In Advances in Neural Information Processing Systems 24, pages 1647–1655, 2011.
- Distributionally robust optimization under moment uncertainty with application to data-driven problems. Operations Research, 58(3):595–612, 2010.
- Learning models with uniform performance via distributionally robust optimization. The Annals of Statistics, 49(3):1378 – 1406, 2021.
- Statistics of robust optimization: A generalized empirical likelihood approach. Mathematics of Operations Research, 46(3):946–969, 2021.
- Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations. Mathematical Programming, 171:115–166, 2018.
- Online convex optimization in the bandit setting: Gradient descent without a gradient. In Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 385–394, 2005.
- A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997.
- Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29(1):79–103, 1999.
- On-demand sampling: Learning optimally from multiple distributions. In Advances in Neural Information Processing Systems 35, pages 406–419, 2022.
- On-demand sampling: Learning optimally from multiple distributions. ArXiv e-prints, arXiv:2210.12529v2, 2023.
- Fairness without demographics in repeated loss minimization. In Proceedings of the 35th International Conference on Machine Learning, pages 1929–1938, 2018.
- Does distributionally robust supervised learning give robust classifiers? In Proceedings of the 35th International Conference on Machine Learning, volume 80, pages 2029–2037, 2018.
- Non-convex distributionally robust optimization: Non-asymptotic analysis. In Advances in Neural Information Processing Systems 34, pages 2771–2782, 2021.
- Solving variational inequalities with stochastic mirror-prox algorithm. Stochastic Systems, 1(1):17–58, 2011.
- Large deviations of vector-valued martingales in 2-smooth normed spaces. ArXiv e-prints, arXiv:0809.0813, 2008.
- Efficient learning by implicit exploration in bandit problems with side observations. In Advances in Neural Information Processing Systems 27, pages 613–621, 2014.
- Wasserstein distributionally robust optimization: Theory and applications in machine learning. Operations Research & Management Science in the Age of Analytics, pages 130–166, 2019.
- Stochastic Approximation and Recursive Algorithms and Applications. Springer, second edition, 2003.
- Guanghui Lan. An optimal method for stochastic composite optimization. Mathematical Programming, 133:365–397, 2012.
- Bandit Algorithms. Cambridge University Press, 2020.
- Large-scale methods for distributionally robust optimization. In Advances in Neural Information Processing Systems 33, pages 8847–8860, 2020.
- Agnostic federated learning. In Proceedings of the 36th International Conference on Machine Learning, pages 4615–4625, 2019.
- Stochastic gradient methods for distributionally robust optimization with f𝑓fitalic_f-divergences. In Advances in Neural Information Processing Systems 29, pages 2216–2224, 2016.
- Variance-based regularization with convex objectives. In Advances in Neural Information Processing Systems 30, pages 2971–2980, 2017.
- Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4):1574–1609, 2009.
- Arkadi Nemirovski. Prox-method with rate of convergence O(1/t)𝑂1𝑡{O}(1/t)italic_O ( 1 / italic_t ) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM Journal on Optimization, 15(1):229–251, 2004.
- Gergely Neu. Explore no more: Improved high-probability regret bounds for non-stochastic bandits. In Advances in Neural Information Processing Systems 28, pages 3168–3176, 2015.
- Improved algorithms for collaborative PAC learning. In Advances in Neural Information Processing Systems 31, pages 7642–7650, 2018.
- Francesco Orabona. A modern introduction to online learning. ArXiv e-prints, arXiv:1912.13213(v5), 2019.
- Distributionally robust language modeling. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pages 4227–4237, 2019.
- An online method for a class of distributionally robust optimization with non-convex objectives. In Advances in Neural Information Processing Systems 34, pages 10067–10080, 2021.
- Classification and knowledge discovery in protein databases. Journal of Biomedical Informatics, 37(4):224–239, 2004.
- Weakly-convex-concave min-max optimization: Provable algorithms and applications in machine learning. Optimization Methods and Software, 37(3):1087–1121, 2022.
- Optimization, learning, and games with predictable sequences. In Advances in Neural Information Processing Systems 26, pages 3066–3074, 2013.
- Multi-group agnostic PAC learnability. In Proceedings of the 38th International Conference on Machine Learning, pages 9107–9115, 2021.
- Topmoumoute online natural gradient algorithm. In Advances in Neural Information Processing Systems 20, pages 849–856, 2008.
- Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. In International Conference on Learning Representations, 2020.
- Distributional robustness loss for long-tail learning. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, pages 9475–9484, 2021.
- Herbert Scarf. A min-max solution of an inventory problem. Studies in the Mathematical Theory of Inventory and Production, pages 201–209, 1958.
- Alexander Shapiro. Distributionally robust stochastic programming. SIAM Journal on Optimization, 27(4):2258–2275, 2017.
- Certifying some distributional robustness with principled adversarial training. In International Conference on Learning Representations, 2018.
- On distributionally robust optimization and data rebalancing. In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, pages 1283–1297, 2022.
- Optimal algorithms for group distributionally robust optimization and beyond. ArXiv e-prints, arXiv:2212.13669, 2022.
- Fast convergence of regularized learning in games. In Advances in Neural Information Processing Systems 28, pages 2989–2997, 2015.
- Vladimir N. Vapnik. The Nature of Statistical Learning Theory. Springer, second edition, 2000.
- Roman Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge University Press, 2018.
- Sinkhorn distributionally robust optimization. ArXiv e-prints, arXiv:2109.11926, 2021.
- Class-weighted classification: Trade-offs and robust approaches. In Proceedings of the 37th International Conference on Machine Learning, pages 10544–10554, 2020.
- Coping with label shift via distributionally robust optimisation. In International Conference on Learning Representations, 2021.
- O(logT)𝑂𝑇O(\log T)italic_O ( roman_log italic_T ) projections for stochastic optimization of smooth and strongly convex functions. In Proceedings of the 30th International Conference on Machine Learning, pages 1121–1129, 2013.
- Zhi-Hua Zhou. A theoretical perspective of machine learning with computational resource concerns. ArXiv e-prints, arXiv:2305.02217, 2023.