First-order penalty methods for bilevel optimization (2301.01716v2)
Abstract: In this paper we study a class of unconstrained and constrained bilevel optimization problems in which the lower level is a possibly nonsmooth convex optimization problem, while the upper level is a possibly nonconvex optimization problem. We introduce a notion of $\varepsilon$-KKT solution for them and show that an $\varepsilon$-KKT solution leads to an $O(\sqrt{\varepsilon})$- or $O(\varepsilon)$-hypergradient based stionary point under suitable assumptions. We also propose first-order penalty methods for finding an $\varepsilon$-KKT solution of them, whose subproblems turn out to be a structured minimax problem and can be suitably solved by a first-order method recently developed by the authors. Under suitable assumptions, an \emph{operation complexity} of $O(\varepsilon{-4}\log\varepsilon{-1})$ and $O(\varepsilon{-7}\log\varepsilon{-1})$, measured by their fundamental operations, is established for the proposed penalty methods for finding an $\varepsilon$-KKT solution of the unconstrained and constrained bilevel optimization problems, respectively. Preliminary numerical results are presented to illustrate the performance of our proposed methods. To the best of our knowledge, this paper is the first work to demonstrate that bilevel optimization can be approximately solved as minimax optimization, and moreover, it provides the first implementable method with complexity guarantees for such sophisticated bilevel optimization.
- G. B. Allende and G. Still. Solving bilevel programs with the KKT-approach. Mathematical programming, 138(1):309–332, 2013.
- J. F. Bard. Practical bilevel optimization: algorithms and applications, volume 30. Springer Science & Business Media, 2013.
- Bilevel optimization and machine learning. In IEEE World Congress on Computational Intelligence, pages 25–47. Springer, 2008.
- Meta-learning with differentiable closed-form solvers. In International Conference on Learning Representations, 2018.
- On bilevel optimization without lower-level strong convexity. arXiv preprint arXiv:2301.00712, 2023.
- A single-timescale method for stochastic bilevel optimization. In International Conference on Artificial Intelligence and Statistics, pages 2466–2488, 2022.
- F. H. Clarke. Optimization and nonsmooth analysis. SIAM, 1990.
- An overview of bilevel optimization. Annals of operations research, 153(1):235–256, 2007.
- Bilevel methods for image reconstruction. Foundations and Trends® in Signal Processing, 15(2-3):121–289, 2022.
- S. Dempe. Foundations of bilevel programming. Springer Science & Business Media, 2002.
- Bilevel programming problems. Energy Systems. Springer, Berlin, 10:978–3, 2015.
- S. Dempe and A. Zemkoho. Bilevel optimization. In Springer optimization and its applications. Vol. 161. Springer, 2020.
- S. Dempe and A. B. Zemkoho. The bilevel programming problem: reformulations, constraint qualifications and optimality conditions. Mathematical Programming, 138(1):447–473, 2013.
- Implicit functions and solution mappings, volume 543. Springer, 2009.
- M. Feurer and F. Hutter. Hyperparameter optimization. In Automated machine learning, pages 3–33. Springer, Cham, 2019.
- Forward and reverse gradient-based hyperparameter optimization. In International Conference on Machine Learning, pages 1165–1173, 2017.
- Bilevel programming for hyperparameter optimization and meta-learning. In International Conference on Machine Learning, pages 1568–1577, 2018.
- S. Ghadimi and M. Wang. Approximation methods for bilevel programming. arXiv preprint arXiv:1802.02246, 2018.
- M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, version 2.1, 2014.
- On the iteration complexity of hypergradient computation. In International Conference on Machine Learning, pages 3748–3758, 2020.
- Randomized stochastic variance-reduced methods for multi-task stochastic bilevel optimization. arXiv preprint arXiv:2105.02266, 2021.
- New branch-and-bound rules for linear bilevel programming. SIAM Journal on scientific and Statistical Computing, 13(5):1194–1217, 1992.
- A two-timescale stochastic algorithm framework for bilevel optimization: Complexity analysis and application to actor-critic. SIAM Journal on Optimization, 33(1):147–180, 2023.
- An improved unconstrained approach for bilevel optimization. SIAM Journal on Optimization, 33(4):2801–2829, 2023.
- F. Huang and H. Huang. Biadam: Fast adaptive bilevel optimization methods. arXiv preprint arXiv:2106.11396, 2021.
- Efficiently escaping saddle points in bilevel optimization. arXiv preprint arXiv:2202.03684, 2022.
- Y. Ishizuka and E. Aiyoshi. Double penalty method for bilevel optimization problems. Annals of Operations Research, 34(1):73–88, 1992.
- Convergence of meta-learning with task-specific adaptation over partial parameters. Advances in Neural Information Processing Systems, 33:11490–11500, 2020.
- Bilevel optimization: Convergence analysis and enhanced design. In International conference on machine learning, pages 4882–4892, 2021.
- A near-optimal algorithm for stochastic bilevel optimization via double-momentum. Advances in neural information processing systems, 34:30271–30283, 2021.
- V. Konda and J. Tsitsiklis. Actor-critic algorithms. Advances in neural information processing systems, 12, 1999.
- W. Kong and R. D. Monteiro. An accelerated inexact proximal point method for solving nonconvex-concave min-max problems. SIAM Journal on Optimization, 31(4):2558–2585, 2021.
- D. Kovalev and A. Gasnikov. The first optimal algorithm for smooth and strongly-convex-strongly-concave minimax optimization. Advances in Neural Information Processing Systems, 35:14691–14703, 2022.
- A fully first-order method for stochastic bilevel optimization. In International Conference on Machine Learning, pages 18083–18113, 2023.
- A fully single loop algorithm for bilevel optimization without hessian inverse. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 7426–7434, 2022.
- A novel approach for bilevel programs based on Wolfe duality. arXiv preprint arXiv:2302.06838, 2023.
- Bome! bilevel optimization made easy: A simple first-order approach. Advances in Neural Information Processing Systems, 35:17248–17262, 2022.
- Darts: Differentiable architecture search. In International Conference on Learning Representations, 2018.
- Investigating bi-level optimization for learning and vision from a unified perspective: A survey and beyond. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
- D. Lopez-Paz and M. Ranzato. Gradient episodic memory for continual learning. Advances in neural information processing systems, 30, 2017.
- Z. Lu and S. Mei. A first-order augmented Lagrangian method for constrained minimax optimization. arXiv preprint arXiv:2301.02060, 2023.
- Mathematical programs with equilibrium constraints. Cambridge University Press, 1996.
- Combined approach with second-order optimality conditions for bilevel programming problems. 2023. To appear in Journal of Convex Analysis.
- Gradient-based hyperparameter optimization through reversible learning. In International conference on machine learning, pages 2113–2122, 2015.
- Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
- J. A. Mirrlees. The theory of moral hazard and unobservable behaviour: Part I. The Review of Economic Studies, 66(1):3–21, 1999.
- Y. Nesterov. Gradient methods for minimizing composite functions. Mathematical programming, 140(1):125–161, 2013.
- Y. E. Nesterov. Gradient methods for minimizing composite functions. Mathematical Programming, 140:125–161, 2013.
- J. Nocedal and S. J. Wright. Numerical optimization. Springer, 1999.
- Nonsmooth approach to optimization problems with equilibrium constraints: theory, applications and numerical results, volume 28. Springer Science & Business Media, 2013.
- F. Pedregosa. Hyperparameter optimization with approximate gradient. In International conference on machine learning, pages 737–746, 2016.
- Meta-learning with implicit gradients. Advances in neural information processing systems, 32, 2019.
- An extended Kuhn–Tucker approach for linear bilevel programming. Applied Mathematics and Computation, 162(1):51–63, 2005.
- Nondifferentiable and two-level mathematical programming. Springer Science & Business Media, 2012.
- A review on bilevel optimization: from classical to evolutionary approaches and applications. IEEE Transactions on Evolutionary Computation, 22(2):276–295, 2017.
- A primal-dual approach to bilevel optimization with multiple inner minima. arXiv preprint arXiv:2203.01123, 2022.
- Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
- Bilevel and multilevel programming: A bibliography review. Journal of Global optimization, 5(3):291–306, 1994.
- H. Von Stackelberg. Market structure and equilibrium. Springer Science & Business Media, 2010.
- D. Ward and J. M. Borwein. Nonsmooth calculus in finite dimensions. SIAM Journal on control and optimization, 25(5):1312–1340, 1987.
- Provably faster algorithms for bilevel optimization. Advances in Neural Information Processing Systems, 34:13670–13682, 2021.
- J. J. Ye. Constraint qualifications and optimality conditions in bilevel optimization. In Bilevel Optimization, pages 227–251. Springer, 2020.
- Difference of convex algorithms for bilevel programs with applications in hyperparameter selection. Mathematical Programming, pages 1–34, 2022.
- R. Zhao. A primal-dual smoothing framework for max-structured non-convex optimization. Mathematics of Operations Research, 2023.