Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond (2302.09267v5)

Published 18 Feb 2023 in cs.LG

Abstract: This paper investigates group distributionally robust optimization (GDRO) with the goal of learning a model that performs well over $m$ different distributions. First, we formulate GDRO as a stochastic convex-concave saddle-point problem, which is then solved by stochastic mirror descent (SMD) with $m$ samples in each iteration, and attain a nearly optimal sample complexity. To reduce the number of samples required in each round from $m$ to 1, we cast GDRO as a two-player game, where one player conducts SMD and the other executes an online algorithm for non-oblivious multi-armed bandits, maintaining the same sample complexity. Next, we extend GDRO to address scenarios involving imbalanced data and heterogeneous distributions. In the first scenario, we introduce a weighted variant of GDRO, enabling distribution-dependent convergence rates that rely on the number of samples from each distribution. We design two strategies to meet the sample budget: one integrates non-uniform sampling into SMD, and the other employs the stochastic mirror-prox algorithm with mini-batches, both of which deliver faster rates for distributions with more samples. In the second scenario, we propose to optimize the average top-$k$ risk instead of the maximum risk, thereby mitigating the impact of outlier distributions. Similar to the case of vanilla GDRO, we develop two stochastic approaches: one uses $m$ samples per iteration via SMD, and the other consumes $k$ samples per iteration through an online algorithm for non-oblivious combinatorial semi-bandits.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. Minimax regret optimization for robust machine learning under distribution shift. In Proceedings of 35th Conference on Learning Theory, pages 2704–2729, 2022.
  2. Deep speech 2 : End-to-end speech recognition in english and mandarin. In Proceedings of the 33rd International Conference on Machine Learning, pages 173–182, 2016.
  3. Regret bounds and minimax policies under partial monitoring. Journal of Machine Learning Research, 11:2785–2836, 2010.
  4. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):48–77, 2002.
  5. Adult. UCI Machine Learning Repository, 1996. DOI: https://doi.org/10.24432/C5XW20.
  6. Robust Optimization. Princeton University Press, 2009.
  7. Robust solutions of optimization problems affected by uncertain probabilities. Management Science, 59(2):341–357, 2013.
  8. Oracle-based robust optimization via online learning. Operations Research, 63(3):628–638, 2015.
  9. Robust sample average approximation. Mathematical Programming, 171:217–282, 2018.
  10. Collaborative PAC learning. In Advances in Neural Information Processing Systems 30, pages 2389–2398, 2017.
  11. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends in Machine Learning, 5(1):1–122, 2012.
  12. Distributionally robust optimization via ball oracle acceleration. In Advances in Neural Information Processing Systems 35, pages 35866–35879, 2022.
  13. Prediction, Learning, and Games. Cambridge University Press, 2006.
  14. Better mini-batch algorithms via accelerated gradient methods. In Advances in Neural Information Processing Systems 24, pages 1647–1655, 2011.
  15. Distributionally robust optimization under moment uncertainty with application to data-driven problems. Operations Research, 58(3):595–612, 2010.
  16. Learning models with uniform performance via distributionally robust optimization. The Annals of Statistics, 49(3):1378 – 1406, 2021.
  17. Statistics of robust optimization: A generalized empirical likelihood approach. Mathematics of Operations Research, 46(3):946–969, 2021.
  18. Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations. Mathematical Programming, 171:115–166, 2018.
  19. Online convex optimization in the bandit setting: Gradient descent without a gradient. In Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 385–394, 2005.
  20. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997.
  21. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29(1):79–103, 1999.
  22. On-demand sampling: Learning optimally from multiple distributions. In Advances in Neural Information Processing Systems 35, pages 406–419, 2022.
  23. On-demand sampling: Learning optimally from multiple distributions. ArXiv e-prints, arXiv:2210.12529v2, 2023.
  24. Fairness without demographics in repeated loss minimization. In Proceedings of the 35th International Conference on Machine Learning, pages 1929–1938, 2018.
  25. Does distributionally robust supervised learning give robust classifiers? In Proceedings of the 35th International Conference on Machine Learning, volume 80, pages 2029–2037, 2018.
  26. Non-convex distributionally robust optimization: Non-asymptotic analysis. In Advances in Neural Information Processing Systems 34, pages 2771–2782, 2021.
  27. Solving variational inequalities with stochastic mirror-prox algorithm. Stochastic Systems, 1(1):17–58, 2011.
  28. Large deviations of vector-valued martingales in 2-smooth normed spaces. ArXiv e-prints, arXiv:0809.0813, 2008.
  29. Efficient learning by implicit exploration in bandit problems with side observations. In Advances in Neural Information Processing Systems 27, pages 613–621, 2014.
  30. Wasserstein distributionally robust optimization: Theory and applications in machine learning. Operations Research & Management Science in the Age of Analytics, pages 130–166, 2019.
  31. Stochastic Approximation and Recursive Algorithms and Applications. Springer, second edition, 2003.
  32. Guanghui Lan. An optimal method for stochastic composite optimization. Mathematical Programming, 133:365–397, 2012.
  33. Bandit Algorithms. Cambridge University Press, 2020.
  34. Large-scale methods for distributionally robust optimization. In Advances in Neural Information Processing Systems 33, pages 8847–8860, 2020.
  35. Agnostic federated learning. In Proceedings of the 36th International Conference on Machine Learning, pages 4615–4625, 2019.
  36. Stochastic gradient methods for distributionally robust optimization with f𝑓fitalic_f-divergences. In Advances in Neural Information Processing Systems 29, pages 2216–2224, 2016.
  37. Variance-based regularization with convex objectives. In Advances in Neural Information Processing Systems 30, pages 2971–2980, 2017.
  38. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4):1574–1609, 2009.
  39. Arkadi Nemirovski. Prox-method with rate of convergence O⁢(1/t)𝑂1𝑡{O}(1/t)italic_O ( 1 / italic_t ) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM Journal on Optimization, 15(1):229–251, 2004.
  40. Gergely Neu. Explore no more: Improved high-probability regret bounds for non-stochastic bandits. In Advances in Neural Information Processing Systems 28, pages 3168–3176, 2015.
  41. Improved algorithms for collaborative PAC learning. In Advances in Neural Information Processing Systems 31, pages 7642–7650, 2018.
  42. Francesco Orabona. A modern introduction to online learning. ArXiv e-prints, arXiv:1912.13213(v5), 2019.
  43. Distributionally robust language modeling. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pages 4227–4237, 2019.
  44. An online method for a class of distributionally robust optimization with non-convex objectives. In Advances in Neural Information Processing Systems 34, pages 10067–10080, 2021.
  45. Classification and knowledge discovery in protein databases. Journal of Biomedical Informatics, 37(4):224–239, 2004.
  46. Weakly-convex-concave min-max optimization: Provable algorithms and applications in machine learning. Optimization Methods and Software, 37(3):1087–1121, 2022.
  47. Optimization, learning, and games with predictable sequences. In Advances in Neural Information Processing Systems 26, pages 3066–3074, 2013.
  48. Multi-group agnostic PAC learnability. In Proceedings of the 38th International Conference on Machine Learning, pages 9107–9115, 2021.
  49. Topmoumoute online natural gradient algorithm. In Advances in Neural Information Processing Systems 20, pages 849–856, 2008.
  50. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. In International Conference on Learning Representations, 2020.
  51. Distributional robustness loss for long-tail learning. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, pages 9475–9484, 2021.
  52. Herbert Scarf. A min-max solution of an inventory problem. Studies in the Mathematical Theory of Inventory and Production, pages 201–209, 1958.
  53. Alexander Shapiro. Distributionally robust stochastic programming. SIAM Journal on Optimization, 27(4):2258–2275, 2017.
  54. Certifying some distributional robustness with principled adversarial training. In International Conference on Learning Representations, 2018.
  55. On distributionally robust optimization and data rebalancing. In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, pages 1283–1297, 2022.
  56. Optimal algorithms for group distributionally robust optimization and beyond. ArXiv e-prints, arXiv:2212.13669, 2022.
  57. Fast convergence of regularized learning in games. In Advances in Neural Information Processing Systems 28, pages 2989–2997, 2015.
  58. Vladimir N. Vapnik. The Nature of Statistical Learning Theory. Springer, second edition, 2000.
  59. Roman Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge University Press, 2018.
  60. Sinkhorn distributionally robust optimization. ArXiv e-prints, arXiv:2109.11926, 2021.
  61. Class-weighted classification: Trade-offs and robust approaches. In Proceedings of the 37th International Conference on Machine Learning, pages 10544–10554, 2020.
  62. Coping with label shift via distributionally robust optimisation. In International Conference on Learning Representations, 2021.
  63. O⁢(log⁡T)𝑂𝑇O(\log T)italic_O ( roman_log italic_T ) projections for stochastic optimization of smooth and strongly convex functions. In Proceedings of the 30th International Conference on Machine Learning, pages 1121–1129, 2013.
  64. Zhi-Hua Zhou. A theoretical perspective of machine learning with computational resource concerns. ArXiv e-prints, arXiv:2305.02217, 2023.

Summary

We haven't generated a summary for this paper yet.