From Inverse Optimization to Feasibility to ERM (2402.17890v2)
Abstract: Inverse optimization involves inferring unknown parameters of an optimization problem from known solutions and is widely used in fields such as transportation, power systems, and healthcare. We study the contextual inverse optimization setting that utilizes additional contextual information to better predict the unknown problem parameters. We focus on contextual inverse linear programming (CILP), addressing the challenges posed by the non-differentiable nature of LPs. For a linear prediction model, we reduce CILP to a convex feasibility problem allowing the use of standard algorithms such as alternating projections. The resulting algorithm for CILP is equipped with theoretical convergence guarantees without additional assumptions such as degeneracy or interpolation. Next, we reduce CILP to empirical risk minimization (ERM) on a smooth, convex loss that satisfies the Polyak-Lojasiewicz condition. This reduction enables the use of scalable first-order optimization methods to solve large non-convex problems while maintaining theoretical guarantees in the convex setting. Subsequently, we use the reduction to ERM to quantify the generalization performance of the proposed algorithm on previously unseen instances. Finally, we experimentally validate our approach on synthetic and real-world problems and demonstrate improved performance compared to existing methods.
- Amos, B. Differentiable optimization-based modeling for machine learning. Ph. D. thesis, 2019.
- Optnet: Differentiable optimization as a layer in neural networks. In International Conference on Machine Learning, pp. 136–145. PMLR, 2017.
- Business analytics for flexible resource allocation under random emergencies. Management Science, 60:1552–1573, 2014.
- Armijo, L. Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics, 16(1):1 – 3, 1966.
- Bansal, R. Optimization methods for electric power systems: An overview. International Journal of Emerging Electric Power Systems, 2, 2005.
- On exponential convergence of sgd in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564, 2018.
- On the convergence of von neumann’s alternating projection algorithm for two sets. Set-Valued Analysis, 1:185–212, 1993.
- On projection algorithms for solving convex feasibility problems. SIAM review, 38(3):367–426, 1996.
- Learning with differentiable pertubed optimizers. Advances in neural information processing systems, 33:9508–9519, 2020.
- From predictive to prescriptive analytics. Management Science, 66:1025–1044, 2020.
- Data-driven estimation in equilibrium using inverse optimization. Mathematical Programming, 153:595–633, 2015.
- Contextual inverse optimization: Offline and online learning. Operations Research, 2023.
- Inverse optimization for the recovery of market structure from market outcomes: An application to the miso electricity market. Operations Research, 65(4):837–855, 2017.
- Stability and generalization. The Journal of Machine Learning Research, 2:499–526, 2002.
- Convex optimization. Cambridge university press, 2004.
- Optimal design of a cmos op-amp via geometric programming. IEEE Transactions on Computer-aided design of integrated circuits and systems, 20:1–21, 2001.
- The perils of learning before optimizing. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 3708–3715, 2022.
- An inverse optimization approach to measuring clinical pathway concordance. Management Science, 68(3):1882–1903, 2022.
- Inverse optimization: Theory and applications. Operations Research, 2023.
- Optimization methods in finance, volume 5. Cambridge University Press, 2006.
- Deutsch, F. Rate of convergence of the method of alternating projections. Parametric optimization and approximation (Oberwolfach, 1983), 72:96–107, 1984.
- Cvxpy: A python-embedded modeling language for convex optimization. The Journal of Machine Learning Research, 17:2909–2913, 2016.
- Ecos: An socp solver for embedded systems. In 2013 European control conference (ECC), pp. 3071–3076. IEEE, 2013.
- Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research, 12(7), 2011.
- Generalization bounds in the predict-then-optimize framework. Advances in neural information processing systems, 32, 2019.
- Smart “predict, then optimize”. Management Science, 68(1):9–26, 2022.
- Portfolio selection. Handbook of finance, 2, 2008.
- Garrigos, G. Square distance functions are polyak-{{\{{\\\backslash\L}}\}} ojasiewicz and vice-versa. arXiv preprint arXiv:2301.10332, 2023.
- Rethinking and benchmarking predict-then-optimize paradigm for combinatorial optimization problems. arXiv preprint arXiv:2311.07633, 2023.
- Sgd for structured nonconvex functions: Learning rates, minibatching and interpolation. In International Conference on Artificial Intelligence and Statistics, pp. 1315–1323. PMLR, 2021.
- Guyomarch, J. Warcraft ii open-source map editor. URL http://github. com/war2/war2edit, 2017.
- Train faster, generalize better: Stability of stochastic gradient descent. In International conference on machine learning, pp. 1225–1234. PMLR, 2016.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Heuberger, C. Inverse combinatorial optimization: A survey on problems, methods, and results. Journal of combinatorial optimization, 8:329–361, 2004.
- Inverse conic programming with applications. Operations Research Letters, 33(3):319–330, 2005.
- Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19-23, 2016, Proceedings, Part I 16, pp. 795–811. Springer, 2016.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Nonlinear programming, paper presented at proceedings of the second berkeley symposium on mathematical statistics and probability, 1951.
- Target-based surrogates for stochastic optimization. arXiv preprint arXiv:2302.02607, 2023.
- Fine-grained analysis of stability and generalization for stochastic gradient descent. In International Conference on Machine Learning, pp. 5809–5819. PMLR, 2020a.
- Sharper generalization bounds for learning with gradient-dominated objective functions. In International Conference on Learning Representations, 2020b.
- Sharper generalization bounds for learning with gradient-dominated objective functions. In International Conference on Learning Representations, 2021.
- An overview and experimental study of learning-based optimization algorithms for the vehicle routing problem. IEEE/CAA Journal of Automatica Sinica, 9:1115–1138, 2022.
- Optimal power flow in stand-alone dc microgrids. IEEE Transactions on Power Systems, 33:5496–5506, 2018.
- Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis, 59:85–116, 2022.
- Aiming towards the minimizers: fast convergence of sgd for overparametrized problems. arXiv preprint arXiv:2306.02601, 2023.
- Linear and nonlinear programming, volume 2. Springer, 1984.
- The power of interpolation: Understanding the effectiveness of sgd in modern over-parametrized learning. In International Conference on Machine Learning, pp. 3325–3334. PMLR, 2018.
- Data-driven inverse optimization with imperfect information. Mathematical Programming, 167:191–234, 2018.
- Algorithms for inverse reinforcement learning. In Icml, volume 1, pp. 2, 2000.
- Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc., 2019.
- Polyak, B. T. Gradient methods for solving equations and inequalities. USSR Computational Mathematics and Mathematical Physics, 4(6):17–32, 1964.
- Explicit regularization of stochastic gradient methods through duality. In International Conference on Artificial Intelligence and Statistics, pp. 1882–1890. PMLR, 2021.
- Optimal path planning of mobile robots: A review. International journal of physical sciences, 7:1314–1320, 2012.
- A stochastic approximation method. The annals of mathematical statistics, pp. 400–407, 1951.
- Backpropagation through combinatorial algorithms: Identity with projection works. arXiv preprint arXiv:2205.15213, 2022.
- Osqp: An operator splitting solver for quadratic programs. Mathematical Programming Computation, 12(4):637–672, 2020.
- Maximum optimality margin: A unified approach for contextual linear programming and inverse linear programming. arXiv preprint arXiv:2301.11260, 2023.
- Fast and faster convergence of sgd for over-parameterized models and an accelerated perceptron. In The 22nd international conference on artificial intelligence and statistics, pp. 1195–1204. PMLR, 2019.
- Differentiation of blackbox combinatorial solvers. arXiv preprint arXiv:1912.02175, 2019.
- Von Neumann, J. On rings of operators. reduction theory. Annals of Mathematics, pp. 401–485, 1949.
- More than accuracy: end-to-end wind power forecasting that optimises the energy system. Electric Power Systems Research, 221:109384, 2023.
- Melding the data-decisions pipeline: Decision-focused learning for combinatorial optimization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp. 1658–1665, 2019.
- Gradient methods for convex minimization: better rates under weaker conditions. arXiv preprint arXiv:1303.4645, 2013.