Accelerated Fully First-Order Methods for Bilevel and Minimax Optimization (2405.00914v3)
Abstract: We present in this paper novel accelerated fully first-order methods in \emph{Bilevel Optimization} (BLO). Firstly, for BLO under the assumption that the lower-level functions admit the typical strong convexity assumption, the \emph{(Perturbed) Restarted Accelerated Fully First-order methods for Bilevel Approximation} (\texttt{PRAF${}2$BA}) algorithm leveraging \emph{fully} first-order oracles is proposed, whereas the algorithm for finding approximate first-order and second-order stationary points with state-of-the-art oracle query complexities in solving complex optimization tasks. Secondly, applying as a special case of BLO the \emph{nonconvex-strongly-convex} (NCSC) minimax optimization, \texttt{PRAF${}2$BA} rediscovers \emph{perturbed restarted accelerated gradient descent ascent} (\texttt{PRAGDA}) that achieves the state-of-the-art complexity for finding approximate second-order stationary points. Additionally, we investigate the challenge of finding stationary points of the hyper-objective function in BLO when lower-level functions lack the typical strong convexity assumption, where we identify several regularity conditions of the lower-level problems that ensure tractability and present hardness results indicating the intractability of BLO for general convex lower-level functions. Under these regularity conditions we propose the \emph{Inexact Gradient-Free Method} (\texttt{IGFM}), utilizing the \emph{Switching Gradient Method} (\texttt{SGM}) as an efficient sub-routine to find an approximate stationary point of the hyper-objective in polynomial time. Empirical studies for real-world problems are provided to further validate the outperformance of our proposed algorithms.
- Wasserstein generative adversarial networks. In International Conference on Machine Learning, pages 214–223. PMLR, 2017.
- Meta-learning with differentiable closed-form solvers. In International Conference on Learning Representations, 2019.
- Escaping saddle points in nonconvex minimax optimization via cubic-regularized gradient descent-ascent. arXiv preprint arXiv:2110.07098, 2021.
- Near-optimal fully first-order algorithms for finding stationary points in bilevel optimization. arXiv preprint arXiv:2306.14853, 2023.
- Closing the gap: Tighter analysis of alternating stochastic gradient methods for bilevel problems. Advances in Neural Information Processing Systems, 34:25294–25307, 2021.
- John M. Danskin. The theory of max-min and its application to weapons allocation problems, volume 5. Springer Science & Business Media, 2012.
- Forward and reverse gradient-based hyperparameter optimization. In International Conference on Machine Learning, pages 1165–1173. PMLR, 2017.
- Bilevel programming for hyperparameter optimization and meta-learning. In International Conference on Machine Learning, pages 1568–1577. PMLR, 2018.
- Hyperparameter optimization. In Automated Machine Learning: Methods, Systems, Challenges, pages 3–33. Springer International Publishing, 2019.
- On the iteration complexity of hypergradient computation. In International Conference on Machine Learning, pages 3748–3758. PMLR, 2020.
- Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
- Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- Approximation methods for bilevel programming. arXiv preprint arXiv:1802.02246, 2018.
- Efficiently escaping saddle points in bilevel optimization. arXiv preprint arXiv:2202.03684, 2022.
- A Riemannian block coordinate descent method for computing the projection robust Wasserstein distance. In International Conference on Machine Learning, pages 4446–4455. PMLR, 2021.
- A two-timescale stochastic algorithm framework for bilevel optimization: Complexity analysis and application to actor-critic. SIAM Journal on Optimization, 33(1):147–180, 2023.
- Lower bounds and accelerated algorithms for bilevel optimization. Journal of Machine Learning Research, 24(22):1–56, 2023.
- Convergence of meta-learning with task-specific adaptation over partial parameters. Advances in Neural Information Processing Systems, 33:11490–11500, 2020.
- Will bilevel optimizers benefit from loops. Advances in Neural Information Processing Systems, 35:3011–3023, 2022.
- What is local optimality in nonconvex-nonconcave minimax optimization? In International conference on machine learning, pages 4880–4889. PMLR, 2020.
- Bilevel optimization: Convergence analysis and enhanced design. In International conference on machine learning, pages 4882–4892. PMLR, 2021.
- A fully first-order method for stochastic bilevel optimization. In International Conference on Machine Learning, pages 18083–18113. PMLR, 2023.
- Actor-critic algorithms. Advances in neural information processing systems, 12, 1999.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
- Projection robust Wasserstein distance and Riemannian optimization. Advances in neural information processing systems, 33:9383–9397, 2020.
- Local stochastic bilevel optimization with momentum-based variance reduction. arXiv preprint arXiv:2205.01608, 2022.
- On gradient descent ascent for nonconvex-concave minimax problems. In International Conference on Machine Learning, pages 6083–6093. PMLR, 2020.
- Finding second-order stationary points in nonconvex-strongly-concave minimax optimization. Advances in Neural Information Processing Systems, 35:36667–36679, 2022.
- Hybrid block successive approximation for one-sided non-convex min-max problems: algorithms and applications. IEEE Transactions on Signal Processing, 68:3676–3691, 2020.
- Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems. Advances in Neural Information Processing Systems, 33:20566–20577, 2020.
- Yurii Nesterov. Introductory Lectures on Convex Optimization: A Basic Course, volume 87. Springer Science & Business Media, 2013.
- Solving a class of non-convex min-max games using iterative first order methods. Advances in Neural Information Processing Systems, 32, 2019.
- Numerical Optimization. Springer-Verlag, New York, 2nd edition, 2006.
- Sharpness, restart and acceleration. Advances in Neural Information Processing Systems, 30, 2017.
- Truncated back-propagation for bilevel optimization. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1723–1732. PMLR, 2019.
- Certifying some distributional robustness with principled adversarial training. In International Conference on Learning Representations, 2018.
- Learning intrinsic rewards as a bi-level optimization problem. In Conference on Uncertainty in Artificial Intelligence, pages 111–120. PMLR, 2020.
- Stochastic cubic regularization for fast nonconvex optimization. Advances in neural information processing systems, 31, 2018.
- Accelerating inexact hypergradient descent for bilevel optimization. In OPT 2023: Optimization for Machine Learning, 2023.