Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Gradient-based Bi-level Optimization for Deep Learning: A Survey (2207.11719v4)

Published 24 Jul 2022 in cs.LG and math.OC

Abstract: Bi-level optimization, especially the gradient-based category, has been widely used in the deep learning community including hyperparameter optimization and meta-knowledge extraction. Bi-level optimization embeds one problem within another and the gradient-based category solves the outer-level task by computing the hypergradient, which is much more efficient than classical methods such as the evolutionary algorithm. In this survey, we first give a formal definition of the gradient-based bi-level optimization. Next, we delineate criteria to determine if a research problem is apt for bi-level optimization and provide a practical guide on structuring such problems into a bi-level optimization framework, a feature particularly beneficial for those new to this domain. More specifically, there are two formulations: the single-task formulation to optimize hyperparameters such as regularization parameters and the distilled data, and the multi-task formulation to extract meta-knowledge such as the model initialization. With a bi-level formulation, we then discuss four bi-level optimization solvers to update the outer variable including explicit gradient update, proxy update, implicit function update, and closed-form update. Finally, we wrap up the survey by highlighting two prospective future directions: (1) Effective Data Optimization for Science examined through the lens of task formulation. (2) Accurate Explicit Proxy Update analyzed from an optimization standpoint.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (91)
  1. Meta soft label generation for noisy labels. In International Conference on Pattern Recognition, 2021.
  2. Learning to learn by gradient descent by gradient descent. In Advances in neural information processing systems, 2016.
  3. Model-based reinforcement learning for biological sequence design. In Proc. Int. Conf. Learning Rep. (ICLR), 2019.
  4. Delta-stn: Efficient bilevel optimization for neural networks using structured response jacobians. Advances in Neural Information Processing Systems, 2020.
  5. Deep equilibrium models. Advances in Neural Information Processing Systems, 2019.
  6. Meta-learning with differentiable closed-form solvers. In International Conference on Learning Representations, 2019.
  7. Poisoning attacks against support vector machines. arXiv preprint arXiv:1206.6389, 2012.
  8. Evograd: Efficient gradient-based meta-learning and hyperparameter optimization. Advances in Neural Information Processing Systems, 2021.
  9. Mathematical programs with optimization problems in the constraints. Operations Research, 1973.
  10. Deep extrapolation for attribute-enhanced generation. Advances in Neural Information Processing Systems, 2021.
  11. Generalized dataweighting via class-level gradient manipulation. Advances in Neural Information Processing Systems, 2021.
  12. Bidirectional learning for offline infinite-width model-based optimization. In Advances in Neural Information Processing Systems, 2022a.
  13. Structure-aware protein self-supervised learning. Bioinformatics, 2022b.
  14. Bidirectional learning for offline model-based biological sequence design. In International Conference on Machine Learning, 2023a.
  15. Understanding benign overfitting in gradient-based meta learning. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022c.
  16. Neural ordinary differential equations. Advances in neural information processing systems, 2018.
  17. Meta-learning adaptive deep kernel gaussian processes for molecular property prediction. In ICLR, 2023b.
  18. λ𝜆\lambdaitalic_λopt: Learn to regularize recommender models in finer levels. In Special Interest Group on Knowledge Discovery and Data Mining, 2019.
  19. Test-time fast adaptation for dynamic scene deblurring via meta-auxiliary learning. In CVPR, 2021.
  20. Metafscil: A meta-learning approach for few-shot class incremental learning. In CVPR, 2022.
  21. Stochastic bilevel programming in structural optimization. Structural and multidisciplinary optimization, 2001.
  22. Continuous-time meta-learning with forward mode differentiation. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=57PipS27Km.
  23. Diva: Dataset derivative of a learning task. arXiv preprint arXiv:2111.09785, 2021.
  24. Autofocused oracles for model-based design. Advances in Neural Information Processing Systems, 2020.
  25. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, 2017.
  26. Forward and reverse gradient-based hyperparameter optimization. In International Conference on Machine Learning, 2017.
  27. Bilevel programming for hyperparameter optimization and meta-learning. In International Conference on Machine Learning, 2018.
  28. Loss function learning for domain generalization by implicit gradient. In International Conference on Machine Learning, 2022.
  29. Approximation methods for bilevel programming. arXiv preprint arXiv:1802.02246, 2018.
  30. Learning surrogate losses. arXiv preprint arXiv:1905.10108, 2019.
  31. On the iteration complexity of hypergradient computation. In International Conference on Machine Learning. PMLR, 2020.
  32. Generalized inner loop meta-learning. arXiv preprint arXiv:1910.01727, 2019.
  33. Learning fast approximations of sparse coding. In Proceedings of the 27th international conference on international conference on machine learning, 2010.
  34. Fine-grained analysis of stability and generalization for modern meta learning algorithms. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022.
  35. Nikolaus Hansen. The CMA evolution strategy: A comparing review. In Jose A. Lozano, Pedro Larrañaga, Iñaki Inza, and Endika Bengoetxea (eds.), Towards a New Evolutionary Computation: Advances in the Estimation of Distribution Algorithms. 2006.
  36. On enforcing better conditioned meta-learning for rapid few-shot adaptation. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022.
  37. Learning data manipulation for augmentation and weighting. In Advances in Neural Information Processing Systems, 2019.
  38. Neural tangent kernel: Convergence and generalization in neural networks. Advances in neural information processing systems, 2018.
  39. Learning multi-objective curricula for deep reinforcement learning. arXiv preprint arXiv:2110.03032, 2021.
  40. Parameter prediction for unseen deep architectures. Advances in Neural Information Processing Systems, 2021.
  41. Understanding black-box predictions via influence functions. In International Conference on Machine Learning, 2017.
  42. Melu: Meta-learned user preference estimator for cold-start recommendation. In Special Interest Group on Knowledge Discovery and Data Mining, 2019.
  43. Deep neural networks as gaussian processes. arXiv preprint arXiv:1711.00165, 2017.
  44. A comprehensive survey to dataset distillation. arXiv preprint arXiv:2301.05603, 2023.
  45. Metamask: Revisiting dimensional confounder for self-supervised learning. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022.
  46. Meta-sgd: Learning to learn quickly for few-shot learning. arXiv preprint arXiv:1707.09835, 2017.
  47. DARTS: Differentiable architecture search. In International Conference on Learning Representations, 2019.
  48. Self-adaptively learning to demoiré from focused and defocused image pairs. Advances in Neural Information Processing Systems, 2020.
  49. Stochastic hyperparameter optimization through hypernetworks. arXiv preprint arXiv:1802.09419, 2018.
  50. Optimizing millions of hyperparameters by implicit differentiation. In International Conference on Artificial Intelligence and Statistics, 2020.
  51. Scalable gradient-based tuning of continuous regularization hyperparameters. In International conference on machine learning, 2016.
  52. Learning gradient descent: Better generalization and longer horizons. In International Conference on Machine Learning, 2017.
  53. Probabilistic metric learning with adaptive margin for top-k recommendation. In Special Interest Group on Knowledge Discovery and Data Mining, 2020.
  54. Self-tuning networks: Bilevel optimization of hyperparameters using structured best-response functions. arXiv preprint arXiv:1903.03088, 2019.
  55. Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 2010.
  56. Understanding and correcting pathologies in the training of learned optimizers. In International Conference on Machine Learning, 2019.
  57. Dataset meta-learning from kernel ridge-regression. arXiv preprint arXiv:2011.00050, 2020.
  58. On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999, 2018.
  59. Neural tangents: Fast and easy infinite neural networks in python. arXiv preprint arXiv:1912.02803, 2019.
  60. Fabian Pedregosa. Hyperparameter optimization with approximate gradient. In International conference on machine learning, 2016.
  61. Meta-learning to improve pre-training. Advances in Neural Information Processing Systems, 2021.
  62. Meta-learning with implicit gradients. Advances in Neural Information Processing Systems, 2019.
  63. Optimization as a model for few-shot learning. 2016.
  64. Learning to Reweight Examples for Robust Deep Learning. In International Conference on Machine Learning, 2018.
  65. Steffen Rendle. Learning recommender systems with adaptive regularization. In Web Search and Data Mining, 2012.
  66. Differentiable implicit soft-body physics. arXiv preprint arXiv:2102.05791, 2021.
  67. Meta-learning with latent embedding optimization. arXiv preprint arXiv:1807.05960, 2018.
  68. Truncated back-propagation for bilevel optimization. In International Conference on Artificial Intelligence and Statistics, 2019.
  69. Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting. In Advances in Neural Information Processing Systems, 2019.
  70. Meta-lr-schedule-net: learned lr schedules that scale and generalize. 2020.
  71. A review on bilevel optimization: from classical to evolutionary approaches and applications. IEEE Transactions on Evolutionary Computation, 2017.
  72. Meta-learning with self-improving momentum target. In Advances in Neural Information Processing Systems, 2022.
  73. Design-bench: Benchmarks for data-driven offline model-based optimization. arXiv preprint arXiv:2202.08450, 2022.
  74. Heinrich Von Stackelberg. Market structure and equilibrium. Springer Science & Business Media, 2010.
  75. Multimodal model-agnostic meta-learning via task-aware modulation. Advances in Neural Information Processing Systems, 2019.
  76. Hyperadam: A learnable task-adaptive adam for network training. In Proceedings of the AAAI Conference on Artificial Intelligence, 2019.
  77. Dataset distillation. arXiv preprint arXiv:1811.10959, 2018.
  78. Optimizing data usage via differentiable rewards. In International Conference on Machine Learning, 2020.
  79. Learned optimizers that scale and generalize. In International Conference on Machine Learning, 2017.
  80. Learning to purify noisy labels via meta soft label corrector. In AAAI, 2021.
  81. Adversarial task up-sampling for meta-learning. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022.
  82. An end-to-end framework for molecular conformation generation via bilevel programming. In International Conference on Machine Learning, 2021.
  83. Provably faster algorithms for bilevel optimization. Advances in Neural Information Processing Systems, 2021.
  84. Hierarchically structured meta-learning. In International Conference on Machine Learning, 2019.
  85. Functionally regionalized knowledge transfer for low-resource drug discovery. Advances in Neural Information Processing Systems, 2021.
  86. Roma: Robust model adaptation for offline model-based optimization. Advances in Neural Information Processing Systems, 2021.
  87. Neural tangent generalization attacks. In International Conference on Machine Learning, 2021.
  88. Ntopo: Mesh-free topology optimization using implicit neural representations. Advances in Neural Information Processing Systems, 2021.
  89. Boosting causal discovery via adaptive sample reweighting. In ICLR, 2023.
  90. Meta-dmoe: Adapting to domain shift by meta-distillation from mixture-of-experts. arXiv preprint arXiv:2210.03885, 2022.
  91. Unraveling model-agnostic meta-learning via the adaptation learning rate. In International Conference on Learning Representations, 2021.
Citations (32)

Summary

We haven't generated a summary for this paper yet.