Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
43 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Algorithms for Empirical Group Distributionally Robust Optimization and Beyond (2403.03562v2)

Published 6 Mar 2024 in cs.LG and stat.ML

Abstract: In this paper, we investigate the empirical counterpart of Group Distributionally Robust Optimization (GDRO), which aims to minimize the maximal empirical risk across $m$ distinct groups. We formulate empirical GDRO as a $\textit{two-level}$ finite-sum convex-concave minimax optimization problem and develop an algorithm called ALEG to benefit from its special structure. ALEG is a double-looped stochastic primal-dual algorithm that incorporates variance reduction techniques into a modified mirror prox routine. To exploit the two-level finite-sum structure, we propose a simple group sampling strategy to construct the stochastic gradient with a smaller Lipschitz constant and then perform variance reduction for all groups. Theoretical analysis shows that ALEG achieves $\varepsilon$-accuracy within a computation complexity of $\mathcal{O}\left(\frac{m\sqrt{\bar{n}\ln{m}}}{\varepsilon}\right)$, where $\bar n$ is the average number of samples among $m$ groups. Notably, our approach outperforms the state-of-the-art method by a factor of $\sqrt{m}$. Based on ALEG, we further develop a two-stage optimization algorithm called ALEM to deal with the empirical Minimax Excess Risk Optimization (MERO) problem. The computation complexity of ALEM nearly matches that of ALEG, surpassing the rates of existing methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. A. Agarwal and T. Zhang. Minimax regret optimization for robust machine learning under distribution shift. In Proceedings of the 35th Conference on Learning Theory, pages 2704–2729, 2022.
  2. A. Alacaoglu and Y. Malitsky. Stochastic variance reduction for variational inequality methods. In Proceedings of the 35th Conference on Learning Theory, pages 778–816, 2022.
  3. Z. Allen-Zhu. Katyusha X: Simple momentum method for stochastic sum-of-nonconvex optimization. In Proceedings of the 35th International Conference on Machine Learning, pages 179–185, 2018.
  4. Z. Allen-Zhu and E. Hazan. Variance reduction for faster non-convex optimization. In Proceedings of the 33rd International conference on Machine Learning, pages 699–707, 2016.
  5. Z. Allen-Zhu and L. Orecchia. Linear coupling: An ultimate unification of gradient and mirror descent. In 8th Innovations in Theoretical Computer Science Conference, pages 3:1–3:22, 2017.
  6. Z. Allen-Zhu and Y. Yuan. Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In Proceedings of the 33rd International Conference on Machine Learning, pages 1080–1089, 2016.
  7. A. Beck and M. Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167–175, 2003.
  8. Robust sample average approximation. Mathematical Programming, 171:217–282, 2014.
  9. Y. Carmon and D. Hausler. Distributionally robust optimization via ball oracle acceleration. In Advances in Neural Information Processing Systems 35, pages 35866–35879, 2022.
  10. Variance reduction for matrix games. In Advances in Neural Information Processing Systems 32, 2019.
  11. Acceleration with a ball optimization oracle. In Advances in Neural Information Processing Systems 33, pages 19052–19063, 2020.
  12. Thinking inside the ball: Near-optimal minimization of the maximal loss. In Proceedings of the 34th Conference on Learning Theory, pages 866–882, 2021.
  13. N. Cesa-Bianchi and G. Lugosi. Prediction, learning, and games. Cambridge university press, 2006.
  14. Faster stochastic algorithms for minimax optimization under Polyak-Łojasiewicz condition. In Advances in Neural Information Processing Systems 35, pages 13921–13932, 2022.
  15. L. Condat. Fast projection onto the simplex and the ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ball. Mathematical Programming, 158(1-2):575–585, 2016.
  16. C. D. Dang and G. Lan. On the convergence properties of non-euclidean extragradient methods for variational inequalities with generalized monotone operators. Computational Optimization and Applications, 60(2):277–310, 2015.
  17. Accelerating variance-reduced stochastic gradient methods. Mathematical Programming, 191(2):1–45, 2022.
  18. J. C. Duchi and H. Namkoong. Learning models with uniform performance via distributionally robust optimization. The Annals of Statistics, 49(3):1378–1406, 2021.
  19. Statistics of robust optimization: A generalized empirical likelihood approach. Mathematics of Operations Research, 46(3):946–969, 2021.
  20. Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997.
  21. R. Gao and A. Kleywegt. Distributionally robust stochastic optimization with wasserstein distance. Mathematics of Operations Research, 48(2):603–655, 2023.
  22. On-demand sampling: Learning optimally from multiple distributions. In Advances in Neural Information Processing Systems 35, pages 406–419, 2022.
  23. Stochastic approximation and recursive algorithm and applications. Application of Mathematics, 35(10), 1997.
  24. Does distributionally robust supervised learning give robust classifiers? In Proceedings of the 35th International Conference on Machine Learning, pages 2029–2037, 2018.
  25. Efficient mirror descent ascent methods for nonsmooth minimax problems. In Advances in Neural Information Processing Systems 34, pages 10431–10443, 2021.
  26. Multi-block-single-probe variance reduced estimator for coupled compositional optimization. In Advances in Neural Information Processing Systems 35, pages 32499–32511, 2022.
  27. Non-convex distributionally robust optimization: Non-asymptotic analysis. In Advances in Neural Information Processing Systems 34, pages 2771–2782, 2021.
  28. R. Johnson and T. Zhang. Accelerating stochastic gradient descent using predictive variance reduction. In Proceedings of the 26th International Conference on Neural Information Processing Systems, page 315–323, 2013.
  29. Solving variational inequalities with stochastic mirror-prox algorithm. Stochastic Systems, 1(1):17–58, 2011.
  30. J. Kivinen and M. K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1–63, 1997.
  31. Learning multiple layers of features from tiny images. 2009.
  32. G. Lan. An optimal method for stochastic composite optimization. Mathematical Programming, 133(1-2):365–397, 2012.
  33. S. Lee and D. Kim. Fast extra gradient methods for smooth structured nonconvex-nonconcave minimax problems. In Advances in Neural Information Processing Systems 34, pages 22588–22600, 2021.
  34. A universal catalyst for first-order optimization. In Advances in Neural Information Processing Systems 28, 2015.
  35. Near-optimal algorithms for minimax optimization. In Proceedings of the 33rd Conference on Learning Theory, pages 2738–2779, 2020.
  36. First-order convergence theory for weakly-convex-weakly-concave min-max problems. The Journal of Machine Learning Research, 22(1):7651–7684, 2021.
  37. A single-loop accelerated extra-gradient difference algorithm with improved complexity bounds for constrained minimax optimization. In Advances in Neural Information Processing Systems 36, pages 61699–61711, 2023.
  38. Near optimal stochastic algorithms for finite-sum unbalanced convex-concave minimax optimization. arXiv preprint arXiv:2106.01761, 2021.
  39. Agnostic federated learning. In Proceedings of the 36th International Conference on Machine Learning, pages 4615–4625, 2019.
  40. Convergence rate of O⁢(1/k)𝑂1𝑘O(1/k)italic_O ( 1 / italic_k ) for optimistic gradient and extragradient methods in smooth convex-concave saddle point problems. SIAM Journal on Optimization, 30(4):3230–3251, 2020.
  41. H. Namkoong and J. C. Duchi. Stochastic gradient methods for distributionally robust optimization with f𝑓fitalic_f-divergences. In Advances in Neural Information Processing Systems 29, 2016.
  42. A. Nemirovski. Prox-method with rate of convergence O⁢(1/t)𝑂1𝑡O(1/t)italic_O ( 1 / italic_t ) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM Journal on Optimization, 15(1):229–251, 2004.
  43. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4):1574–1609, 2009.
  44. Y. Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103(1):127–152, 2005.
  45. Y. Nesterov. Primal-dual subgradient methods for convex problems. Mathematical Programming, 120(1):221–259, 2009.
  46. Y. Nesterov et al. Lectures on convex optimization, volume 137. Springer, 2018.
  47. G. Neu. Explore no more: Improved high-probability regret bounds for non-stochastic bandits. In Advances in Neural Information Processing Systems 28, pages 3168–3176, 2015.
  48. Distributionally robust language modeling. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pages 4227–4237, 2019.
  49. Y. Ouyang and Y. Xu. Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems. Mathematical Programming, 185(1–2):1–35, 2021a.
  50. Y. Ouyang and Y. Xu. Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems. Mathematical Programming, 185(1-2):1–35, 2021b.
  51. B. Palaniappan and F. Bach. Stochastic variance reduction methods for saddle-point problems. In Advances in Neural Information Processing Systems 29, 2016.
  52. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30(4):838–855, 1992.
  53. Stochastic variance reduction for nonconvex optimization. In Proceedings of the 33rd International Conference on Machine Learning, pages 314–323, 2016.
  54. H. Robbins and S. Monro. A stochastic approximation method. The Annals of Mathematical Statistics, pages 400–407, 1951.
  55. R. T. Rockafellar. Convex Analysis:(PMS-28). Princeton University Press, 2015.
  56. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. In International Conference on Learning Representations, 2020.
  57. S. Shalev-Shwartz and S. Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.
  58. Stochastic convex optimization. In Proceedings of the 22nd Conference on Learning Theory, page 5, 2009.
  59. Fast stochastic variance reduced gradient method with momentum acceleration for machine learning. arXiv preprint arXiv:1703.07948, 2017.
  60. A. Shapiro. Distributionally robust stochastic programming. SIAM Journal on Optimization, 27(4):2258–2275, 2017.
  61. Optimal algorithms for group distributionally robust optimization and beyond. arXiv preprint arXiv:2212.13669, 2022.
  62. L. Xiao and T. Zhang. A proximal stochastic gradient method with progressive variance reduction. SIAM Journal on Optimization, 24(4):2057–2075, 2014.
  63. Lower complexity bounds for finite-sum convex-concave minimax optimization problems. In Proceedings of the 37th International Conference on Machine Learning, pages 10504–10513, 2020.
  64. Global convergence and variance reduction for a class of nonconvex-nonconcave minimax problems. In Advances in Neural Information Processing Systems 33, pages 1153–1165, 2020.
  65. E. Yazdandoost Hamedani and A. Jalilzadeh. A stochastic variance-reduced accelerated primal-dual method for finite-sum saddle-point problems. Computational Optimization and Applications, 85(2):1–27, 2023.
  66. L. Zhang and W.-W. Tu. Efficient stochastic approximation of minimax excess risk optimization. arXiv preprint arXiv:2306.00026, 2023.
  67. Linear convergence with condition number independent access of full gradients. In Advance in Neural Information Processing Systems 26, pages 980–988, 2013.
  68. Stochastic approximation approaches to group distributionally robust optimization. In Advances in Neural Information Processing Systems 36, pages 52490–52522, 2023.
  69. R. Zhao. Accelerated stochastic algorithms for convex-concave saddle-point problems. Mathematics of Operations Research, 47(2):1443–1473, 2022.
  70. A simple stochastic variance reduced algorithm with fast convergence rates. In Proceedings of the 35th International Conference on Machine Learning, pages 5980–5989, 2018.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets