You Shall Pass: Dealing with the Zero-Gradient Problem in Predict and Optimize for Convex Optimization (2307.16304v2)
Abstract: Predict and optimize is an increasingly popular decision-making paradigm that employs machine learning to predict unknown parameters of optimization problems. Instead of minimizing the prediction error of the parameters, it trains predictive models using task performance as a loss function. The key challenge to train such models is the computation of the Jacobian of the solution of the optimization problem with respect to its parameters. For linear problems, this Jacobian is known to be zero or undefined; hence, approximations are usually employed. For non-linear convex problems, however, it is common to use the exact Jacobian. This paper demonstrates that the zero-gradient problem appears in the non-linear case as well -- the Jacobian can have a sizeable null space, thereby causing the training process to get stuck in suboptimal points. Through formal proofs, this paper shows that smoothing the feasible set resolves this problem. Combining this insight with known techniques from the literature, such as quadratic programming approximation and projection distance regularization, a novel method to approximate the Jacobian is derived. In simulation experiments, the proposed method increases the performance in the non-linear case and at least matches the existing state-of-the-art methods for linear problems.
- Differentiable convex optimization layers. Adv. Neural Inf. Process. Syst., 32, 2019.
- Learning with differentiable perturbed optimizers. Adv. Neural Inf. Process. Syst., 2020-Decem:1–24, 2020.
- Bronstein, E. M. Approximation of convex sets by polytopes. Journal of Mathematical Sciences, 153(6):727–762, 2008.
- Enforcing policy feasibility constraints through differentiable projection for energy optimization. e-Energy 2021 - Proceedings of the 2021 12th ACM International Conference on Future Energy Systems, pp. 199–210, 2021.
- Task-based end-to-end model learning in stochastic optimization. Adv. Neural Inf. Process. Syst., 2017-Decem:5485–5495, 2017.
- Smart” predict, then optimize”. arXiv preprint arXiv:1710.08005, 2017.
- MIPaaL: Mixed integer program as a layer. AAAI 2020 - 34th AAAI Conference on Artificial Intelligence, pp. 1504–1511, 2020.
- Fiacco, A. V. Sensitivity analysis for nonlinear programming using penalty methods. Mathematical programming, 10(1):287–311, 1976.
- Ghomi, M. Optimal smoothing for convex polytopes. Bulletin of the London Mathematical Society, 36(4):483–492, 2004. doi: https://doi.org/10.1112/S0024609303003059.
- Disciplined Convex Programming, pp. 155–210. Springer US, Boston, MA, 2006. ISBN 978-0-387-30528-8.
- Stochastic estimation of the maximum of a regression function. The Annals of Mathematical Statistics, pp. 462–466, 1952.
- Nonlinear programming. In Berkeley Symposium on Mathematical Statistics and Probability, 2:481–492, 1951.
- Optimal Power Flow in Stand-Alone DC Microgrids. IEEE Transactions on Power Systems, 33(5):5496–5506, 2018. ISSN 08858950. doi: 10.1109/TPWRS.2018.2801280.
- Towards gradient-based bilevel optimization with non-convex followers and beyond, 2021.
- Interior point solving for LP-based prediction+optimisation. October 2020.
- Mean-variance analysis in portfolio choice and capital markets, volume 66. John Wiley & Sons, 2000.
- Prioritized allocation of emergency responders based on a continuous-time incident prediction model. In International Conference on Autonomous Agents and MultiAgent Systems, 2017.
- Pedregosa, F. Hyperparameter optimization with approximate gradient, 2022.
- QUANDL. Quandl wiki prices, 2020., 2020.
- Backpropagation through combinatorial algorithms: Identity with projection works, 2023.
- A review on bilevel optimization: From classical to evolutionary approaches and applications. IEEE Transactions on Evolutionary Computation, 22(2):276–295, 2017.
- Pyepo: A pytorch-based end-to-end predict-then-optimize library for linear and integer programming. arXiv preprint arXiv:2206.14234, 2022.
- End-to-End risk budgeting portfolio optimization with neural networks. July 2021.
- Surrogate dc microgrid models for optimization of charging electric vehicles under partial observability. Energies, 15(4):1389, 2022.
- Differentiation of blackbox combinatorial solvers. pp. 1–19, 2019.
- Automatically learning compact quality-aware surrogates for optimization problems. June 2020.
- Melding the Data-Decisions pipeline: Decision-Focused learning for combinatorial optimization. AAAI, 33(01):1658–1665, July 2019.
- Efficient gradient approximation method for constrained bilevel optimization. Proceedings of the AAAI Conference on Artificial Intelligence, 37(10):12509–12517, June 2023. ISSN 2159-5399. doi: 10.1609/aaai.v37i10.26473. URL http://dx.doi.org/10.1609/aaai.v37i10.26473.