Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On Finding Small Hyper-Gradients in Bilevel Optimization: Hardness Results and Improved Analysis (2301.00712v5)

Published 2 Jan 2023 in math.OC, cs.AI, and cs.LG

Abstract: Bilevel optimization reveals the inner structure of otherwise oblique optimization problems, such as hyperparameter tuning, neural architecture search, and meta-learning. A common goal in bilevel optimization is to minimize a hyper-objective that implicitly depends on the solution set of the lower-level function. Although this hyper-objective approach is widely used, its theoretical properties have not been thoroughly investigated in cases where the lower-level functions lack strong convexity. In this work, we first provide hardness results to show that the goal of finding stationary points of the hyper-objective for nonconvex-convex bilevel optimization can be intractable for zero-respecting algorithms. Then we study a class of tractable nonconvex-nonconvex bilevel problems when the lower-level function satisfies the Polyak-{\L}ojasiewicz (PL) condition. We show a simple first-order algorithm can achieve better complexity bounds of $\tilde{\mathcal{O}}(\epsilon{-2})$, $\tilde{\mathcal{O}}(\epsilon{-4})$ and $\tilde{\mathcal{O}}(\epsilon{-6})$ in the deterministic, partially stochastic, and fully stochastic setting respectively.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. Amortized implicit differentiation for stochastic bilevel optimization. In ICLR, 2021.
  2. Non-convex bilevel games with critical point selection maps. In NeurIPS, 2022.
  3. Lower bounds for non-convex stochastic optimization. Mathematical Programming, 199(1-2):165–214, 2023.
  4. Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019.
  5. Nonsmooth implicit differentiation for machine-learning and optimization. In NeurIPS, 2021.
  6. Lower bounds for finding stationary points ii: first-order methods. Mathematical Programming, 185(1):315–355, 2021.
  7. Stability and generalization of learning algorithms that converge to global optima. In ICML, 2018.
  8. Near-optimal fully first-order algorithms for finding stationary points in bilevel optimization. arXiv preprint arXiv:2306.14853, 2023.
  9. A framework for bilevel optimization that enables stochastic and global variance reduction algorithms. In NeurIPS, 2022.
  10. Stephan Dempe. Foundations of bilevel programming. Springer Science & Business Media, 2002.
  11. Justin Domke. Generic methods for optimization-based modeling. In AISTATS, 2012.
  12. Implicit functions and solution mappings: A view from variational analysis, volume 616. Springer, 2009.
  13. Sharp analysis of stochastic optimization under global kurdyka-lojasiewicz inequality. In NeurIPS, 2022.
  14. Global convergence of policy gradient methods for the linear quadratic regulator. In ICML, 2018.
  15. Forward and reverse gradient-based hyperparameter optimization. In ICML, 2017.
  16. Bilevel programming for hyperparameter optimization and meta-learning. In ICML, 2018.
  17. Approximation methods for bilevel programming. arXiv preprint arXiv:1802.02246, 2018.
  18. Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
  19. On differentiating parameterized argmin and argmax problems with application to bi-level optimization. arXiv preprint arXiv:1607.05447, 2016.
  20. On the iteration complexity of hypergradient computation. In ICML, 2020.
  21. Identity matters in deep learning. arXiv preprint arXiv:1611.04231, 2016.
  22. A two-timescale stochastic algorithm framework for bilevel optimization: Complexity analysis and application to actor-critic. SIAM Journal on Optimization, 33(1):147–180, 2023.
  23. Meta-learning in neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence, 44(9):5149–5169, 2021.
  24. Bilevel optimization: Convergence analysis and enhanced design. In ICML, 2021.
  25. Invariant and transportable representations for anti-causal domain shifts. In NeurIPS, 2022.
  26. Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition. In Joint European conference on machine learning and knowledge discovery in databases, 2016.
  27. Actor-critic algorithms. In NeurIPS, 1999.
  28. A fully first-order method for stochastic bilevel optimization. In ICML, 2023.
  29. On penalty methods for nonconvex bilevel optimization and first-order stochastic approximation. In ICLR, 2024.
  30. Nonsmooth composite nonconvex-concave minimax optimization. arXiv preprint arXiv:2209.10825, 2022.
  31. Algorithmic regularization in over-parameterized matrix sensing and neural networks with quadratic activations. In COLT, 2018.
  32. On gradient descent ascent for nonconvex-concave minimax problems. In ICML, 2020a.
  33. Near-optimal algorithms for minimax optimization. In COLT, 2020b.
  34. BOME! bilevel optimization made easy: A simple first-order approach. In NeurIPS, 2022a.
  35. Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis, 59:85–116, 2022b.
  36. Quasi-newton methods for saddle point problems. In NeurIPS, 2022.
  37. DARTS: Differentiable architecture search. In ICLR, 2019.
  38. A generic first-order algorithmic framework for bi-level programming beyond lower-level singleton. In ICML, 2020.
  39. A value-function-based interior-point method for non-convex bi-level optimization. In ICML, 2021.
  40. Stanislaw Lojasiewicz. A topological property of real analytic subsets. Coll. du CNRS, Les équations aux dérivées partielles, 117(87-89):2, 1963.
  41. Existence theorems of equilibrium points in stackelberg. Optimization, 18(6):857–866, 1987.
  42. Yurii Nesterov. Lectures on convex optimization, volume 137. Springer, 2018.
  43. Fabian Pedregosa. Hyperparameter optimization with approximate gradient. In ICML, 2016.
  44. Contextual transformation networks for online continual learning. In ICLR, 2021.
  45. Boris Teodorovich Polyak. A general method for solving extremal problems. In Doklady Akademii Nauk, volume 174, pages 33–36. Russian Academy of Sciences, 1967.
  46. Meta-learning with implicit gradients. In NeurIPS, 2019.
  47. Optimization as a model for few-shot learning. In ICLR, 2017.
  48. Truncated back-propagation for bilevel optimization. In AISTATS, 2019.
  49. On penalty-based bilevel gradient descent method. In ICML, 2023.
  50. Certifying some distributional robustness with principled adversarial training. In ICLR, 2018.
  51. ZARTS: On zero-order optimization for neural architecture search. In NeurIPS, 2022.
  52. An alternating method for bilevel optimization under the polyak-łojasiewicz condition. In NeurIPS, 2023.
  53. Exact penalization and necessary optimality conditions for generalized bilevel programming problems. SIAM Journal on optimization, 7(2):481–507, 1997.
  54. On the lower bound of minimizing Polyak-Łojasiewicz functions. In COLT, 2023.
  55. iDARTS: Differentiable architecture search with stochastic implicit gradients. In ICML, 2021.
  56. Doubly smoothed gda: Global convergent algorithm for constrained nonconvex-nonconcave minimax optimization. In NeurIPS, 2023.
  57. Model agnostic sample reweighting for out-of-distribution learning. In ICML, 2022.
  58. Neural architecture search with reinforcement learning. In ICLR, 2016.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Lesi Chen (11 papers)
  2. Jing Xu (244 papers)
  3. Jingzhao Zhang (54 papers)
Citations (7)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets