Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Accelerated Fully First-Order Methods for Bilevel and Minimax Optimization (2405.00914v3)

Published 1 May 2024 in math.OC, cs.LG, and stat.ML

Abstract: We present in this paper novel accelerated fully first-order methods in \emph{Bilevel Optimization} (BLO). Firstly, for BLO under the assumption that the lower-level functions admit the typical strong convexity assumption, the \emph{(Perturbed) Restarted Accelerated Fully First-order methods for Bilevel Approximation} (\texttt{PRAF${}2$BA}) algorithm leveraging \emph{fully} first-order oracles is proposed, whereas the algorithm for finding approximate first-order and second-order stationary points with state-of-the-art oracle query complexities in solving complex optimization tasks. Secondly, applying as a special case of BLO the \emph{nonconvex-strongly-convex} (NCSC) minimax optimization, \texttt{PRAF${}2$BA} rediscovers \emph{perturbed restarted accelerated gradient descent ascent} (\texttt{PRAGDA}) that achieves the state-of-the-art complexity for finding approximate second-order stationary points. Additionally, we investigate the challenge of finding stationary points of the hyper-objective function in BLO when lower-level functions lack the typical strong convexity assumption, where we identify several regularity conditions of the lower-level problems that ensure tractability and present hardness results indicating the intractability of BLO for general convex lower-level functions. Under these regularity conditions we propose the \emph{Inexact Gradient-Free Method} (\texttt{IGFM}), utilizing the \emph{Switching Gradient Method} (\texttt{SGM}) as an efficient sub-routine to find an approximate stationary point of the hyper-objective in polynomial time. Empirical studies for real-world problems are provided to further validate the outperformance of our proposed algorithms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Wasserstein generative adversarial networks. In International Conference on Machine Learning, pages 214–223. PMLR, 2017.
  2. Meta-learning with differentiable closed-form solvers. In International Conference on Learning Representations, 2019.
  3. Escaping saddle points in nonconvex minimax optimization via cubic-regularized gradient descent-ascent. arXiv preprint arXiv:2110.07098, 2021.
  4. Near-optimal fully first-order algorithms for finding stationary points in bilevel optimization. arXiv preprint arXiv:2306.14853, 2023.
  5. Closing the gap: Tighter analysis of alternating stochastic gradient methods for bilevel problems. Advances in Neural Information Processing Systems, 34:25294–25307, 2021.
  6. John M. Danskin. The theory of max-min and its application to weapons allocation problems, volume 5. Springer Science & Business Media, 2012.
  7. Forward and reverse gradient-based hyperparameter optimization. In International Conference on Machine Learning, pages 1165–1173. PMLR, 2017.
  8. Bilevel programming for hyperparameter optimization and meta-learning. In International Conference on Machine Learning, pages 1568–1577. PMLR, 2018.
  9. Hyperparameter optimization. In Automated Machine Learning: Methods, Systems, Challenges, pages 3–33. Springer International Publishing, 2019.
  10. On the iteration complexity of hypergradient computation. In International Conference on Machine Learning, pages 3748–3758. PMLR, 2020.
  11. Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
  12. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
  13. Approximation methods for bilevel programming. arXiv preprint arXiv:1802.02246, 2018.
  14. Efficiently escaping saddle points in bilevel optimization. arXiv preprint arXiv:2202.03684, 2022.
  15. A Riemannian block coordinate descent method for computing the projection robust Wasserstein distance. In International Conference on Machine Learning, pages 4446–4455. PMLR, 2021.
  16. A two-timescale stochastic algorithm framework for bilevel optimization: Complexity analysis and application to actor-critic. SIAM Journal on Optimization, 33(1):147–180, 2023.
  17. Lower bounds and accelerated algorithms for bilevel optimization. Journal of Machine Learning Research, 24(22):1–56, 2023.
  18. Convergence of meta-learning with task-specific adaptation over partial parameters. Advances in Neural Information Processing Systems, 33:11490–11500, 2020.
  19. Will bilevel optimizers benefit from loops. Advances in Neural Information Processing Systems, 35:3011–3023, 2022.
  20. What is local optimality in nonconvex-nonconcave minimax optimization? In International conference on machine learning, pages 4880–4889. PMLR, 2020.
  21. Bilevel optimization: Convergence analysis and enhanced design. In International conference on machine learning, pages 4882–4892. PMLR, 2021.
  22. A fully first-order method for stochastic bilevel optimization. In International Conference on Machine Learning, pages 18083–18113. PMLR, 2023.
  23. Actor-critic algorithms. Advances in neural information processing systems, 12, 1999.
  24. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  25. Projection robust Wasserstein distance and Riemannian optimization. Advances in neural information processing systems, 33:9383–9397, 2020.
  26. Local stochastic bilevel optimization with momentum-based variance reduction. arXiv preprint arXiv:2205.01608, 2022.
  27. On gradient descent ascent for nonconvex-concave minimax problems. In International Conference on Machine Learning, pages 6083–6093. PMLR, 2020.
  28. Finding second-order stationary points in nonconvex-strongly-concave minimax optimization. Advances in Neural Information Processing Systems, 35:36667–36679, 2022.
  29. Hybrid block successive approximation for one-sided non-convex min-max problems: algorithms and applications. IEEE Transactions on Signal Processing, 68:3676–3691, 2020.
  30. Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems. Advances in Neural Information Processing Systems, 33:20566–20577, 2020.
  31. Yurii Nesterov. Introductory Lectures on Convex Optimization: A Basic Course, volume 87. Springer Science & Business Media, 2013.
  32. Solving a class of non-convex min-max games using iterative first order methods. Advances in Neural Information Processing Systems, 32, 2019.
  33. Numerical Optimization. Springer-Verlag, New York, 2nd edition, 2006.
  34. Sharpness, restart and acceleration. Advances in Neural Information Processing Systems, 30, 2017.
  35. Truncated back-propagation for bilevel optimization. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1723–1732. PMLR, 2019.
  36. Certifying some distributional robustness with principled adversarial training. In International Conference on Learning Representations, 2018.
  37. Learning intrinsic rewards as a bi-level optimization problem. In Conference on Uncertainty in Artificial Intelligence, pages 111–120. PMLR, 2020.
  38. Stochastic cubic regularization for fast nonconvex optimization. Advances in neural information processing systems, 31, 2018.
  39. Accelerating inexact hypergradient descent for bilevel optimization. In OPT 2023: Optimization for Machine Learning, 2023.

Summary

We haven't generated a summary for this paper yet.