Decentralized Implicit Differentiation (2403.01260v1)
Abstract: The ability to differentiate through optimization problems has unlocked numerous applications, from optimization-based layers in machine learning models to complex design problems formulated as bilevel programs. It has been shown that exploiting problem structure can yield significant computation gains for optimization and, in some cases, enable distributed computation. One should expect that this structure can be similarly exploited for gradient computation. In this work, we discuss a decentralized framework for computing gradients of constraint-coupled optimization problems. First, we show that this framework results in significant computational gains, especially for large systems, and provide sufficient conditions for its validity. Second, we leverage exponential decay of sensitivities in graph-structured problems towards building a fully distributed algorithm with convergence guarantees. Finally, we use the methodology to rigorously estimate marginal emissions rates in power systems models. Specifically, we demonstrate how the distributed scheme allows for accurate and efficient estimation of these important emissions metrics on large dynamic power system models.
- Y Bengio “Gradient-based optimization of hyperparameters” In Neural Comput. 12.8, 2000, pp. 1889–1900
- “Bilevel Programming for Hyperparameter Optimization and Meta-Learning” In Proceedings of the 35th International Conference on Machine Learning 80 PMLR, 2018, pp. 1568–1577
- Akshay Agrawal, Shane Barratt and Stephen Boyd “Learning convex optimization models” In IEEE/CAA J. Autom. Sin. 8.8 Institute of ElectricalElectronics Engineers (IEEE), 2021, pp. 1355–1364
- “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers” In Foundations and Trends in Machine Learning 3.1 Now Publishers, 2011, pp. 1–122
- “Advances and Open Problems in Federated Learning” Now FoundationsTrends, 2021
- Justin Domke “Generic Methods for Optimization-Based Modeling” In Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics 22, Proceedings of Machine Learning Research La Palma, Canary Islands: PMLR, 2012, pp. 318–326
- Parth Nobel, Emmanuel Candès and Stephen Boyd “Tractable Evaluation of Stein’s Unbiased Risk Estimator with Convex Regularizers”, 2022 arXiv:2211.05947 [math.ST]
- Dougal Maclaurin, David Duvenaud and Ryan Adams “Gradient-based Hyperparameter Optimization through Reversible Learning” In Proceedings of the 32nd International Conference on Machine Learning 37, Proceedings of Machine Learning Research Lille, France: PMLR, 2015, pp. 2113–2122
- “Forward and Reverse Gradient-Based Hyperparameter Optimization” In Proceedings of the 34th International Conference on Machine Learning 70, Proceedings of Machine Learning Research PMLR, 2017, pp. 1165–1173
- “Truncated Back-propagation for Bilevel Optimization” In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research PMLR, 2019, pp. 1723–1732
- Brandon Amos and J Zico Kolter “Optnet: Differentiable optimization as a layer in neural networks” In International Conference on Machine Learning, 2017, pp. 136–145
- Shane Barratt “On the differentiability of the solution to convex optimization problems” In arXiv 0.1, 2018, pp. 1–4
- Yinyu Ye, Michael J Todd and Shinji Mizuno “An O(√square-root\surd√n L)-Iteration Homogeneous and Self-Dual Linear Programming Algorithm” In Math. Oper. Res. 19.1 INFORMS, 1994, pp. 53–67
- “Differentiating through a cone program” In J. Appl. Numer. Optim. 2019.2 Mathematical Research Press, 2019
- “Learning Convex Optimization Control Policies” In Proceedings of the 2nd Conference on Learning for Dynamics and Control 120, Proceedings of Machine Learning Research PMLR, 2020, pp. 361–373
- “Differentiable convex optimization layers” In Advances in Neural Information Processing Systems, 2019
- Bryan Wilder, Bistra Dilkina and Milind Tambe “Melding the data-decisions pipeline: decision-focused learning for combinatorial optimization” In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, AAAI’19 Article 204 Honolulu, Hawaii, USA: AAAI Press, 2019, pp. 1658–1665
- “MIPaaL: Mixed Integer Program as a Layer” In AAAI 34.02, 2020, pp. 1504–1511
- “End-to-End Differentiable Physics for Learning and Control” In Advances in Neural Information Processing Systems 31 Curran Associates, Inc., 2018
- “Differentiable MPC for end-to-end planning and control” In Adv. Neural Inf. Process. Syst. 2018-Decem.NeurIPS, 2018, pp. 8289–8300
- “Meta-learning with implicit gradients” In NeurIPS 2019, 2019
- Shane T Barratt and Stephen P Boyd “Least squares auto-tuning” In Eng. Optim., 2020, pp. 1–28
- Enzo Busseti, Walaa M Moursi and Stephen Boyd “Solution refinement at regular points of conic problems” In Comput. Optim. Appl. 74.3, 2019, pp. 627–643
- “On the Iteration Complexity of Hypergradient Computation” In Proceedings of the 37th International Conference on Machine Learning 119 PMLR, 2020, pp. 3748–3758
- Fabian Pedregosa “Hyperparameter optimization with approximate gradient” In Proceedings of The 33rd International Conference on Machine Learning 48 New York, New York, USA: PMLR, 2016, pp. 737–746
- Jonathan Lorraine, Paul Vicol and David Duvenaud “Optimizing Millions of Hyperparameters by Implicit Differentiation” In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics 108 PMLR, 2020, pp. 1540–1552
- “Efficient and Modular Implicit Differentiation” In NeurIPS 2022, 2022
- Ankur Sinha, Pekka Malo and Kalyanmoy Deb “A Review on Bilevel Optimization: From Classical to Evolutionary Approaches and Applications” In IEEE Trans. Evol. Comput. 22.2 IEEE, 2018, pp. 276–295
- Xuxing Chen, Minhui Huang and Shiqian Ma “Decentralized Bilevel Optimization” In arXiv, 2022
- “Decentralized Stochastic Bilevel Optimization with Improved per-Iteration Complexity” In Proceedings of the 40th International Conference on Machine Learning 202 PMLR, 2023, pp. 4641–4671
- “Distributed stochastic gradient tracking methods” In Math. Program. 187.1, 2021, pp. 409–457
- “Approximation Methods for Bilevel Programming”, 2018 arXiv:1802.02246 [math.OC]
- Shuoguang Yang, Xuezhou Zhang and Mengdi Wang “Decentralized Gossip-Based Stochastic Bilevel Optimization over Communication Networks” In Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 2022
- Hongchang Gao, Bin Gu and My T Thai “On the Convergence of Distributed Stochastic Bilevel Optimization Algorithms over a Network” In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics 206 PMLR, 2023, pp. 9238–9281
- “Locality in Network Optimization” In IEEE Transactions on Control of Network Systems 6.2, 2019, pp. 487–500
- “Exploiting Locality and Structure for Distributed Optimization in Multi-Agent Systems” In 2020 European Control Conference (ECC), 2020, pp. 440–447
- “On Local Computation for Network-Structured Convex Optimization in Multiagent Systems” In IEEE Transactions on Control of Network Systems 8.2, 2021, pp. 542–554
- Sungho Shin, Mihai Anitescu and Victor M Zavala “Exponential Decay of Sensitivity in Graph-Structured Nonlinear Programs” In SIAM J. Optim. 32.2 Society for IndustrialApplied Mathematics, 2022, pp. 1156–1183
- Sungho Shin, Victor M Zavala and Mihai Anitescu “Decentralized Schemes With Overlap for Solving Graph-Structured Optimization Problems” In IEEE Transactions on Control of Network Systems 7.3, 2020, pp. 1225–1236
- “Tracking-ADMM for distributed constraint-coupled optimization” In Automatica 117, 2020, pp. 108962
- Yanxu Su, Qingling Wang and Changyin Sun “Distributed Primal-Dual Method for Convex Optimization With Coupled Constraints” In IEEE Trans. Signal Process. 70, 2022, pp. 523–535
- Asen L Dontchev and R Tyrrell Rockafellar “Implicit Functions and Solution Mappings” Springer New York, 2009
- Anthony V Fiacco “Sensitivity analysis for nonlinear programming using penalty methods” In Math. Program. 10.1, 1976, pp. 287–311
- Roger A Horn and Charles R Johnson “Matrix Analysis” Cambridge University Press, 1990
- Kyle Siler-Evans, Inês Lima Azevedo and M Granger Morgan “Marginal emissions factors for the U.S. electricity system” In Environmental Science and Technology 46.9, 2012, pp. 4742–4748
- Aleksandr Rudkevich and Pablo A Ruiz “Locational Carbon Footprint of the Power Industry: Implications for Operations, Planning and Policy Making” In Handbook of CO2 in Power Systems Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 131–165
- “Dynamic Locational Marginal Emissions Via Implicit Differentiation” In IEEE Trans. Power Syst., 2023, pp. 1–11
- “Emissions-aware electricity network expansion planning via implicit differentiation” In NeurIPS 2021 Workshop Tackling Climate Change with Machine Learning, 2021
- “Convex optimization” Cambridge university press, 2004
- Jean Gallier “Geometric Methods and Applications” Springer New York, 2011
- A R Meenakshi and C Rajian “On a product of positive semidefinite matrices” In Linear Algebra Appl. 295.1, 1999, pp. 3–6
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.