Nonsmooth Implicit Differentiation: Deterministic and Stochastic Convergence Rates (2403.11687v3)
Abstract: We study the problem of efficiently computing the derivative of the fixed-point of a parametric nondifferentiable contraction map. This problem has wide applications in machine learning, including hyperparameter optimization, meta-learning and data poisoning attacks. We analyze two popular approaches: iterative differentiation (ITD) and approximate implicit differentiation (AID). A key challenge behind the nonsmooth setting is that the chain rule does not hold anymore. We build upon the work by Bolte et al. (2022), who prove linear convergence of nonsmooth ITD under a piecewise Lipschitz smooth assumption. In the deterministic case, we provide a linear rate for AID and an improved linear rate for ITD which closely match the ones for the smooth setting. We further introduce NSID, a new stochastic method to compute the implicit derivative when the contraction map is defined as the composition of an outer map and an inner map which is accessible only through a stochastic unbiased estimator. We establish rates for the convergence of NSID, encompassing the best available rates in the smooth setting. We also present illustrative experiments confirming our analysis.
- Optnet: Differentiable optimization as a layer in neural networks. In International Conference on Machine Learning, pp. 136–145. PMLR, 2017.
- Amortized implicit differentiation for stochastic bilevel optimization. In International Conference on Learning Representations, 2021.
- Deep equilibrium models. Advances in Neural Information Processing Systems, 32, 2019.
- Beer, G. Topologies on Closed and Closed Convex Sets, volume 268. Springer Science & Business Media, 1993.
- Meta-learning with differentiable closed-form solvers. In International Conference on Learning Representations (ICLR), 2019, 2019.
- Implicit differentiation of lasso-type models for hyperparameter optimization. In International Conference on Machine Learning, pp. 810–821. PMLR, 2020.
- Implicit differentiation for fast hyperparameter selection in non-smooth convex learning. The Journal of Machine Learning Research, 23(1):6680–6722, 2022.
- Efficient and modular implicit differentiation. Advances in Neural Information Processing Systems, 35:5230–5242, 2022.
- Conservative set valued fields, automatic differentiation, stochastic gradient methods and deep learning. Mathematical Programming, 188:19–51, 2021.
- Automatic differentiation of nonsmooth iterative algorithms. Advances in Neural Information Processing Systems, 35:26404–26417, 2022.
- Differentiating nonsmooth solutions to parametric monotone inclusion problems. SIAM Journal of Optimization, 34:71–97, 2024.
- Optimization methods for large-scale machine learning. SIAM review, 60(2):223–311, 2018.
- Closing the gap: Tighter analysis of alternating stochastic gradient methods for bilevel problems. Advances in Neural Information Processing Systems, 34:25294–25307, 2021.
- Signal recovery by proximal forward-backward splitting. Multiscale Modeling & Simulation, 4(4):1168–1200, 2005.
- Forward and reverse gradient-based hyperparameter optimization. In International Conference on Machine Learning, pp. 1165–1173. PMLR, 2017.
- Bilevel programming for hyperparameter optimization and meta-learning. In International Conference on Machine Learning, pp. 1568–1577. PMLR, 2018.
- Bilevel learning of the group lasso structure. Advances in Neural Information Processing Systems, 31, 2018.
- Approximation methods for bilevel programming. arXiv preprint arXiv:1802.02246, 2018.
- On the iteration complexity of hypergradient computation. In International Conference on Machine Learning, pp. 3748–3758. PMLR, 2020.
- Convergence properties of stochastic hypergradients. In International Conference on Artificial Intelligence and Statistics, pp. 3826–3834. PMLR, 2021.
- Bilevel optimization with a lower-level contraction: Optimal sample complexity without warm-start. Journal of Machine Learning Research, 24(167):1–37, 2023.
- Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, volume 105. SIAM, 2008.
- Bilevel optimization: Convergence analysis and enhanced design. In International Conference on Machine Learning, pp. 4882–4892. PMLR, 2021.
- Linearly constrained bilevel optimization: A smoothed implicit gradient approach. In International Conference on Machine Learning, pp. 16291–16325. PMLR, 2023.
- Meta-learning with differentiable convex optimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10657–10665, 2019.
- Boml: A modularized bilevel optimization library in python for meta learning. In 2021 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 1–2. IEEE, 2021.
- Optimizing millions of hyperparameters by implicit differentiation. In International Conference on Artificial Intelligence and Statistics, pp. 1540–1552. PMLR, 2020.
- Gradient-based hyperparameter optimization through reversible learning. In International Conference on Machine Learning, pp. 2113–2122. PMLR, 2015.
- Towards poisoning of deep learning algorithms with back-gradient optimization. In ACM Workshop on Artificial Intelligence and Security, pp. 27–38, 2017.
- Bilevel optimization with nonsmooth lower level problems. In Scale Space and Variational Methods in Computer Vision: 5th International Conference, pp. 654–665. Springer, 2015.
- Pedregosa, F. Hyperparameter optimization with approximate gradient. In International Conference on Machine Learning, pp. 737–746, 2016.
- Meta-learning with implicit gradients. Advances in Neural Information Processing Systems, 32, 2019.
- A stochastic forward-backward splitting method for solving monotone inclusions in hilbert spaces. arXiv preprint arXiv:1403.7999, 2014.
- Convergence of stochastic proximal gradient algorithm. Applied Mathematics & Optimization, 82:891–917, 2020.
- Scholtes, S. Introduction to Piecewise Differentiable Equations. Springer Science & Business Media, 2012.
- Is feature selection secure against training data poisoning? In International Conference on Machine Learning, pp. 1689–1698. PMLR, 2015.
- Alternating projected sgd for equality-constrained bilevel optimization. In International Conference on Artificial Intelligence and Statistics, pp. 987–1023. PMLR, 2023.