On Finding Small Hyper-Gradients in Bilevel Optimization: Hardness Results and Improved Analysis (2301.00712v5)
Abstract: Bilevel optimization reveals the inner structure of otherwise oblique optimization problems, such as hyperparameter tuning, neural architecture search, and meta-learning. A common goal in bilevel optimization is to minimize a hyper-objective that implicitly depends on the solution set of the lower-level function. Although this hyper-objective approach is widely used, its theoretical properties have not been thoroughly investigated in cases where the lower-level functions lack strong convexity. In this work, we first provide hardness results to show that the goal of finding stationary points of the hyper-objective for nonconvex-convex bilevel optimization can be intractable for zero-respecting algorithms. Then we study a class of tractable nonconvex-nonconvex bilevel problems when the lower-level function satisfies the Polyak-{\L}ojasiewicz (PL) condition. We show a simple first-order algorithm can achieve better complexity bounds of $\tilde{\mathcal{O}}(\epsilon{-2})$, $\tilde{\mathcal{O}}(\epsilon{-4})$ and $\tilde{\mathcal{O}}(\epsilon{-6})$ in the deterministic, partially stochastic, and fully stochastic setting respectively.
- Amortized implicit differentiation for stochastic bilevel optimization. In ICLR, 2021.
- Non-convex bilevel games with critical point selection maps. In NeurIPS, 2022.
- Lower bounds for non-convex stochastic optimization. Mathematical Programming, 199(1-2):165–214, 2023.
- Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019.
- Nonsmooth implicit differentiation for machine-learning and optimization. In NeurIPS, 2021.
- Lower bounds for finding stationary points ii: first-order methods. Mathematical Programming, 185(1):315–355, 2021.
- Stability and generalization of learning algorithms that converge to global optima. In ICML, 2018.
- Near-optimal fully first-order algorithms for finding stationary points in bilevel optimization. arXiv preprint arXiv:2306.14853, 2023.
- A framework for bilevel optimization that enables stochastic and global variance reduction algorithms. In NeurIPS, 2022.
- Stephan Dempe. Foundations of bilevel programming. Springer Science & Business Media, 2002.
- Justin Domke. Generic methods for optimization-based modeling. In AISTATS, 2012.
- Implicit functions and solution mappings: A view from variational analysis, volume 616. Springer, 2009.
- Sharp analysis of stochastic optimization under global kurdyka-lojasiewicz inequality. In NeurIPS, 2022.
- Global convergence of policy gradient methods for the linear quadratic regulator. In ICML, 2018.
- Forward and reverse gradient-based hyperparameter optimization. In ICML, 2017.
- Bilevel programming for hyperparameter optimization and meta-learning. In ICML, 2018.
- Approximation methods for bilevel programming. arXiv preprint arXiv:1802.02246, 2018.
- Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
- On differentiating parameterized argmin and argmax problems with application to bi-level optimization. arXiv preprint arXiv:1607.05447, 2016.
- On the iteration complexity of hypergradient computation. In ICML, 2020.
- Identity matters in deep learning. arXiv preprint arXiv:1611.04231, 2016.
- A two-timescale stochastic algorithm framework for bilevel optimization: Complexity analysis and application to actor-critic. SIAM Journal on Optimization, 33(1):147–180, 2023.
- Meta-learning in neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence, 44(9):5149–5169, 2021.
- Bilevel optimization: Convergence analysis and enhanced design. In ICML, 2021.
- Invariant and transportable representations for anti-causal domain shifts. In NeurIPS, 2022.
- Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition. In Joint European conference on machine learning and knowledge discovery in databases, 2016.
- Actor-critic algorithms. In NeurIPS, 1999.
- A fully first-order method for stochastic bilevel optimization. In ICML, 2023.
- On penalty methods for nonconvex bilevel optimization and first-order stochastic approximation. In ICLR, 2024.
- Nonsmooth composite nonconvex-concave minimax optimization. arXiv preprint arXiv:2209.10825, 2022.
- Algorithmic regularization in over-parameterized matrix sensing and neural networks with quadratic activations. In COLT, 2018.
- On gradient descent ascent for nonconvex-concave minimax problems. In ICML, 2020a.
- Near-optimal algorithms for minimax optimization. In COLT, 2020b.
- BOME! bilevel optimization made easy: A simple first-order approach. In NeurIPS, 2022a.
- Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis, 59:85–116, 2022b.
- Quasi-newton methods for saddle point problems. In NeurIPS, 2022.
- DARTS: Differentiable architecture search. In ICLR, 2019.
- A generic first-order algorithmic framework for bi-level programming beyond lower-level singleton. In ICML, 2020.
- A value-function-based interior-point method for non-convex bi-level optimization. In ICML, 2021.
- Stanislaw Lojasiewicz. A topological property of real analytic subsets. Coll. du CNRS, Les équations aux dérivées partielles, 117(87-89):2, 1963.
- Existence theorems of equilibrium points in stackelberg. Optimization, 18(6):857–866, 1987.
- Yurii Nesterov. Lectures on convex optimization, volume 137. Springer, 2018.
- Fabian Pedregosa. Hyperparameter optimization with approximate gradient. In ICML, 2016.
- Contextual transformation networks for online continual learning. In ICLR, 2021.
- Boris Teodorovich Polyak. A general method for solving extremal problems. In Doklady Akademii Nauk, volume 174, pages 33–36. Russian Academy of Sciences, 1967.
- Meta-learning with implicit gradients. In NeurIPS, 2019.
- Optimization as a model for few-shot learning. In ICLR, 2017.
- Truncated back-propagation for bilevel optimization. In AISTATS, 2019.
- On penalty-based bilevel gradient descent method. In ICML, 2023.
- Certifying some distributional robustness with principled adversarial training. In ICLR, 2018.
- ZARTS: On zero-order optimization for neural architecture search. In NeurIPS, 2022.
- An alternating method for bilevel optimization under the polyak-łojasiewicz condition. In NeurIPS, 2023.
- Exact penalization and necessary optimality conditions for generalized bilevel programming problems. SIAM Journal on optimization, 7(2):481–507, 1997.
- On the lower bound of minimizing Polyak-Łojasiewicz functions. In COLT, 2023.
- iDARTS: Differentiable architecture search with stochastic implicit gradients. In ICML, 2021.
- Doubly smoothed gda: Global convergent algorithm for constrained nonconvex-nonconcave minimax optimization. In NeurIPS, 2023.
- Model agnostic sample reweighting for out-of-distribution learning. In ICML, 2022.
- Neural architecture search with reinforcement learning. In ICLR, 2016.
- Lesi Chen (11 papers)
- Jing Xu (244 papers)
- Jingzhao Zhang (54 papers)