Gradient-based Bi-level Optimization for Deep Learning: A Survey (2207.11719v4)
Abstract: Bi-level optimization, especially the gradient-based category, has been widely used in the deep learning community including hyperparameter optimization and meta-knowledge extraction. Bi-level optimization embeds one problem within another and the gradient-based category solves the outer-level task by computing the hypergradient, which is much more efficient than classical methods such as the evolutionary algorithm. In this survey, we first give a formal definition of the gradient-based bi-level optimization. Next, we delineate criteria to determine if a research problem is apt for bi-level optimization and provide a practical guide on structuring such problems into a bi-level optimization framework, a feature particularly beneficial for those new to this domain. More specifically, there are two formulations: the single-task formulation to optimize hyperparameters such as regularization parameters and the distilled data, and the multi-task formulation to extract meta-knowledge such as the model initialization. With a bi-level formulation, we then discuss four bi-level optimization solvers to update the outer variable including explicit gradient update, proxy update, implicit function update, and closed-form update. Finally, we wrap up the survey by highlighting two prospective future directions: (1) Effective Data Optimization for Science examined through the lens of task formulation. (2) Accurate Explicit Proxy Update analyzed from an optimization standpoint.
- Meta soft label generation for noisy labels. In International Conference on Pattern Recognition, 2021.
- Learning to learn by gradient descent by gradient descent. In Advances in neural information processing systems, 2016.
- Model-based reinforcement learning for biological sequence design. In Proc. Int. Conf. Learning Rep. (ICLR), 2019.
- Delta-stn: Efficient bilevel optimization for neural networks using structured response jacobians. Advances in Neural Information Processing Systems, 2020.
- Deep equilibrium models. Advances in Neural Information Processing Systems, 2019.
- Meta-learning with differentiable closed-form solvers. In International Conference on Learning Representations, 2019.
- Poisoning attacks against support vector machines. arXiv preprint arXiv:1206.6389, 2012.
- Evograd: Efficient gradient-based meta-learning and hyperparameter optimization. Advances in Neural Information Processing Systems, 2021.
- Mathematical programs with optimization problems in the constraints. Operations Research, 1973.
- Deep extrapolation for attribute-enhanced generation. Advances in Neural Information Processing Systems, 2021.
- Generalized dataweighting via class-level gradient manipulation. Advances in Neural Information Processing Systems, 2021.
- Bidirectional learning for offline infinite-width model-based optimization. In Advances in Neural Information Processing Systems, 2022a.
- Structure-aware protein self-supervised learning. Bioinformatics, 2022b.
- Bidirectional learning for offline model-based biological sequence design. In International Conference on Machine Learning, 2023a.
- Understanding benign overfitting in gradient-based meta learning. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022c.
- Neural ordinary differential equations. Advances in neural information processing systems, 2018.
- Meta-learning adaptive deep kernel gaussian processes for molecular property prediction. In ICLR, 2023b.
- λ𝜆\lambdaitalic_λopt: Learn to regularize recommender models in finer levels. In Special Interest Group on Knowledge Discovery and Data Mining, 2019.
- Test-time fast adaptation for dynamic scene deblurring via meta-auxiliary learning. In CVPR, 2021.
- Metafscil: A meta-learning approach for few-shot class incremental learning. In CVPR, 2022.
- Stochastic bilevel programming in structural optimization. Structural and multidisciplinary optimization, 2001.
- Continuous-time meta-learning with forward mode differentiation. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=57PipS27Km.
- Diva: Dataset derivative of a learning task. arXiv preprint arXiv:2111.09785, 2021.
- Autofocused oracles for model-based design. Advances in Neural Information Processing Systems, 2020.
- Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, 2017.
- Forward and reverse gradient-based hyperparameter optimization. In International Conference on Machine Learning, 2017.
- Bilevel programming for hyperparameter optimization and meta-learning. In International Conference on Machine Learning, 2018.
- Loss function learning for domain generalization by implicit gradient. In International Conference on Machine Learning, 2022.
- Approximation methods for bilevel programming. arXiv preprint arXiv:1802.02246, 2018.
- Learning surrogate losses. arXiv preprint arXiv:1905.10108, 2019.
- On the iteration complexity of hypergradient computation. In International Conference on Machine Learning. PMLR, 2020.
- Generalized inner loop meta-learning. arXiv preprint arXiv:1910.01727, 2019.
- Learning fast approximations of sparse coding. In Proceedings of the 27th international conference on international conference on machine learning, 2010.
- Fine-grained analysis of stability and generalization for modern meta learning algorithms. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022.
- Nikolaus Hansen. The CMA evolution strategy: A comparing review. In Jose A. Lozano, Pedro Larrañaga, Iñaki Inza, and Endika Bengoetxea (eds.), Towards a New Evolutionary Computation: Advances in the Estimation of Distribution Algorithms. 2006.
- On enforcing better conditioned meta-learning for rapid few-shot adaptation. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022.
- Learning data manipulation for augmentation and weighting. In Advances in Neural Information Processing Systems, 2019.
- Neural tangent kernel: Convergence and generalization in neural networks. Advances in neural information processing systems, 2018.
- Learning multi-objective curricula for deep reinforcement learning. arXiv preprint arXiv:2110.03032, 2021.
- Parameter prediction for unseen deep architectures. Advances in Neural Information Processing Systems, 2021.
- Understanding black-box predictions via influence functions. In International Conference on Machine Learning, 2017.
- Melu: Meta-learned user preference estimator for cold-start recommendation. In Special Interest Group on Knowledge Discovery and Data Mining, 2019.
- Deep neural networks as gaussian processes. arXiv preprint arXiv:1711.00165, 2017.
- A comprehensive survey to dataset distillation. arXiv preprint arXiv:2301.05603, 2023.
- Metamask: Revisiting dimensional confounder for self-supervised learning. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022.
- Meta-sgd: Learning to learn quickly for few-shot learning. arXiv preprint arXiv:1707.09835, 2017.
- DARTS: Differentiable architecture search. In International Conference on Learning Representations, 2019.
- Self-adaptively learning to demoiré from focused and defocused image pairs. Advances in Neural Information Processing Systems, 2020.
- Stochastic hyperparameter optimization through hypernetworks. arXiv preprint arXiv:1802.09419, 2018.
- Optimizing millions of hyperparameters by implicit differentiation. In International Conference on Artificial Intelligence and Statistics, 2020.
- Scalable gradient-based tuning of continuous regularization hyperparameters. In International conference on machine learning, 2016.
- Learning gradient descent: Better generalization and longer horizons. In International Conference on Machine Learning, 2017.
- Probabilistic metric learning with adaptive margin for top-k recommendation. In Special Interest Group on Knowledge Discovery and Data Mining, 2020.
- Self-tuning networks: Bilevel optimization of hyperparameters using structured best-response functions. arXiv preprint arXiv:1903.03088, 2019.
- Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 2010.
- Understanding and correcting pathologies in the training of learned optimizers. In International Conference on Machine Learning, 2019.
- Dataset meta-learning from kernel ridge-regression. arXiv preprint arXiv:2011.00050, 2020.
- On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999, 2018.
- Neural tangents: Fast and easy infinite neural networks in python. arXiv preprint arXiv:1912.02803, 2019.
- Fabian Pedregosa. Hyperparameter optimization with approximate gradient. In International conference on machine learning, 2016.
- Meta-learning to improve pre-training. Advances in Neural Information Processing Systems, 2021.
- Meta-learning with implicit gradients. Advances in Neural Information Processing Systems, 2019.
- Optimization as a model for few-shot learning. 2016.
- Learning to Reweight Examples for Robust Deep Learning. In International Conference on Machine Learning, 2018.
- Steffen Rendle. Learning recommender systems with adaptive regularization. In Web Search and Data Mining, 2012.
- Differentiable implicit soft-body physics. arXiv preprint arXiv:2102.05791, 2021.
- Meta-learning with latent embedding optimization. arXiv preprint arXiv:1807.05960, 2018.
- Truncated back-propagation for bilevel optimization. In International Conference on Artificial Intelligence and Statistics, 2019.
- Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting. In Advances in Neural Information Processing Systems, 2019.
- Meta-lr-schedule-net: learned lr schedules that scale and generalize. 2020.
- A review on bilevel optimization: from classical to evolutionary approaches and applications. IEEE Transactions on Evolutionary Computation, 2017.
- Meta-learning with self-improving momentum target. In Advances in Neural Information Processing Systems, 2022.
- Design-bench: Benchmarks for data-driven offline model-based optimization. arXiv preprint arXiv:2202.08450, 2022.
- Heinrich Von Stackelberg. Market structure and equilibrium. Springer Science & Business Media, 2010.
- Multimodal model-agnostic meta-learning via task-aware modulation. Advances in Neural Information Processing Systems, 2019.
- Hyperadam: A learnable task-adaptive adam for network training. In Proceedings of the AAAI Conference on Artificial Intelligence, 2019.
- Dataset distillation. arXiv preprint arXiv:1811.10959, 2018.
- Optimizing data usage via differentiable rewards. In International Conference on Machine Learning, 2020.
- Learned optimizers that scale and generalize. In International Conference on Machine Learning, 2017.
- Learning to purify noisy labels via meta soft label corrector. In AAAI, 2021.
- Adversarial task up-sampling for meta-learning. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022.
- An end-to-end framework for molecular conformation generation via bilevel programming. In International Conference on Machine Learning, 2021.
- Provably faster algorithms for bilevel optimization. Advances in Neural Information Processing Systems, 2021.
- Hierarchically structured meta-learning. In International Conference on Machine Learning, 2019.
- Functionally regionalized knowledge transfer for low-resource drug discovery. Advances in Neural Information Processing Systems, 2021.
- Roma: Robust model adaptation for offline model-based optimization. Advances in Neural Information Processing Systems, 2021.
- Neural tangent generalization attacks. In International Conference on Machine Learning, 2021.
- Ntopo: Mesh-free topology optimization using implicit neural representations. Advances in Neural Information Processing Systems, 2021.
- Boosting causal discovery via adaptive sample reweighting. In ICLR, 2023.
- Meta-dmoe: Adapting to domain shift by meta-distillation from mixture-of-experts. arXiv preprint arXiv:2210.03885, 2022.
- Unraveling model-agnostic meta-learning via the adaptation learning rate. In International Conference on Learning Representations, 2021.