Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty (2405.14973v1)
Abstract: Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains. Some SDMU applications are naturally modeled as Multistage Stochastic Optimization Problems (MSPs), but the resulting optimizations are notoriously challenging from a computational standpoint. Under assumptions of convexity and stage-wise independence of the uncertainty, the resulting optimization can be solved efficiently using Stochastic Dual Dynamic Programming (SDDP). Two-stage Linear Decision Rules (TS-LDRs) have been proposed to solve MSPs without the stage-wise independence assumption. TS-LDRs are computationally tractable, but using a policy that is a linear function of past observations is typically not suitable for non-convex environments arising, for example, in energy systems. This paper introduces a novel approach, Two-Stage General Decision Rules (TS-GDR), to generalize the policy space beyond linear functions, making them suitable for non-convex environments. TS-GDR is a self-supervised learning algorithm that trains the nonlinear decision rules using stochastic gradient descent (SGD); its forward passes solve the policy implementation optimization problems, and the backward passes leverage duality theory to obtain closed-form gradients. The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-DDR). The method inherits the flexibility and computational performance of Deep Learning methodologies to solve SDMU problems generally tackled through large-scale optimization techniques. Applied to the Long-Term Hydrothermal Dispatch (LTHD) problem using actual power system data from Bolivia, the TS-DDR not only enhances solution quality but also significantly reduces computation times by several orders of magnitude.
- MOSEK ApS. The MOSEK optimization toolbox for Julia manual. Version 10.1., 2024. URL https://github.com/MOSEK/Mosek.jl.
- Two-stage linear decision rules for multi-stage stochastic programming. Mathematical Programming, pages 1–34, 2022.
- J Carpentier. Contribution to the economic dispatch problem. Bulletin de la Societe Francoise des Electriciens, 3(8):431–447, 1962.
- Learning optimization proxies for large-scale security-constrained economic dispatch. Electric Power Systems Research, 213:108566, 2022.
- Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Machine Learning, 110(9):2419–2468, 2021.
- Emission-constrained optimization of gas networks: Input-convex neural network approach. In 2023 62nd IEEE Conference on Decision and Control (CDC), pages 1575–1579. IEEE, 2023.
- Machine learning meets mathematical optimization to predict the optimal production of offshore wind parks. Computers & Operations Research, 106:289–297, 2019.
- Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2023. URL https://www.gurobi.com.
- Fashionable modelling with flux. CoRR, abs/1811.01457, 2018. URL https://arxiv.org/abs/1811.01457.
- A differentiable programming system to bridge machine learning and scientific computing. arXiv preprint arXiv:1907.07587, 2019.
- End-to-end constrained optimization learning: A survey. arXiv preprint arXiv:2103.16378, 2021.
- Twenty years of application of stochastic dual dynamic programming in official and agent studies in brazil-main features and improvements on the newave model. In 2018 power systems computation conference (PSCC), pages 1–7. IEEE, 2018.
- Ali Mesbah. Stochastic model predictive control: An overview and perspectives for future research. IEEE Control Systems Magazine, 36(6):30–44, 2016.
- A survey of relaxations and approximations of the power flow equations. Foundations and Trends® in Electric Energy Systems, 4(1-2):1–221, 2019.
- Kevin P Murphy. A survey of pomdp solution techniques. environment, 2(10), 2000.
- Solving multistage stochastic linear programming via regularized linear decision rules: An application to hydrothermal dispatch planning. European Journal of Operational Research, 309(1):345–358, 2023.
- PACE. Partnership for an Advanced Computing Environment (PACE), 2017. URL http://www.pace.gatech.edu.
- Multi-stage stochastic optimization applied to energy planning. Mathematical programming, 52:359–375, 1991.
- Warren B Powell. Approximate Dynamic Programming: Solving the curses of dimensionality, volume 703. John Wiley & Sons, 2007.
- Warren B Powell. A unified framework for optimization under uncertainty. In Optimization challenges in complex, networked and risky systems, pages 45–83. INFORMS, 2016.
- Stochastic programming in transportation and logistics. Handbooks in operations research and management science, 10:555–635, 2003.
- PSR. Software | PSR, 2019. URL http://www.psr-inc.com/softwares-en/. [Online; accessed 2019-07-06].
- Assessing the cost of network simplifications in long-term hydrothermal dispatch planning models. IEEE Transactions on Sustainable Energy, 13(1):196–206, 2021.
- Lectures on stochastic programming: modeling and theory. SIAM, 2009.
- Accelerating optimal power flow with GPUs: SIMD abstraction of nonlinear programs and condensed-space interior-point methods. arXiv preprint arXiv:2307.16830, 2023.
- Time-consistent risk-constrained dynamic portfolio optimization with transactional costs and time-dependent returns. Annals of Operations Research, 282:379–405, 2019.
- Pascal Van Hentenryck. Machine learning for optimal power flows. Tutorials in Operations Research: Emerging Optimization Methods and Modeling Techniques with Applications, pages 62–82, 2021.
- Q-learning. Machine learning, 8:279–292, 1992.