Data-driven Economic NMPC using Reinforcement Learning (1904.04152v1)

Published 8 Apr 2019 in cs.SY

Abstract: Reinforcement Learning (RL) is a powerful tool to perform data-driven optimal control without relying on a model of the system. However, RL struggles to provide hard guarantees on the behavior of the resulting control scheme. In contrast, Nonlinear Model Predictive Control (NMPC) and Economic NMPC (ENMPC) are standard tools for the closed-loop optimal control of complex systems with constraints and limitations, and benefit from a rich theory to assess their closed-loop behavior. Unfortunately, the performance of (E)NMPC hinges on the quality of the model underlying the control scheme. In this paper, we show that an (E)NMPC scheme can be tuned to deliver the optimal policy of the real system even when using a wrong model. This result also holds for real systems having stochastic dynamics. This entails that ENMPC can be used as a new type of function approximator within RL. Furthermore, we investigate our results in the context of ENMPC and formally connect them to the concept of dissipativity, which is central for the ENMPC stability. Finally, we detail how these results can be used to deploy classic RL tools for tuning (E)NMPC schemes. We apply these tools on both a classical linear MPC setting and a standard nonlinear example from the ENMPC literature.

Citations (161)

View on Semantic Scholar

Summary

The paper demonstrates theoretically that an Economic NMPC scheme can achieve optimal control policies even with an incorrect system model by suitably adjusting stage and terminal costs.
It proposes parametrizing ENMPC within a Reinforcement Learning framework, tuning parameters to optimize control policy directly rather than fitting the process model, enabling improved data-driven performance.
The research provides algorithms for deploying this combined approach using temporal-difference learning and policy gradient methods, addressing practical stability concerns and ensuring robust control strategies.

Overview of Data-driven Economic NMPC Using Reinforcement Learning

This paper presents an innovative exploration in the intersection of Nonlinear Model Predictive Control (NMPC), Economic NMPC (ENMPC), and Reinforcement Learning (RL). The main contention is that NMPC schemes, including ENMPC, can achieve the optimal control policy even when the system model employed is incorrect, by suitably adjusting the stage cost, terminal cost, and constraints. This insight opens up the potential for NMPC frameworks to serve as sophisticated function approximators within the RL paradigm, overcoming some traditional limitations of RL concerning hard guarantees on system behavior.

Main Contributions

Optimal Control Policy with Inexact Models: The authors provide theoretical guarantees, demonstrating that an NMPC scheme can deliver optimal policies even with wrong models. They show that a correctly adjusted stage and terminal cost can maintain optimal policy under specific conditions, involving theoretical constructs such as dissipativity.
ENMPC Parametrization for RL: By exploring the integration of NMPC within RL systems, the paper proposes a sophisticated parametrization for ENMPC schemes. It emphasizes tuning parameters not to fit processes exactly but to optimize the control policy directly, facilitating improved data-driven control performance.
Stability and Dissipativity: The work connects NMPC tuning within RL to core NMPC stability principles. It considers how stability, particularly through strict dissipativity, can be ensured even within systems having generic stage costs through cost modifications.
Algorithmic Implementation: The research details practical algorithms for leveraging RL alongside NMPC formulations, utilizing approaches such as Temporal-Difference learning and deterministic policy gradient methods. The authors suggest practical ways to compute gradients and sensibilities within the MPC context, enabling efficient deployment within RL.

Key Findings

Theoretical Connection: A central result is the observation that an ENMPC framework, well-tuned, captures optimal policies for stochastic systems using modified stage cost.
Practical Algorithm Development: The proposed methodologies offer systematic principles for aligning NMPC and RL, supporting robust, adaptive control strategies without model perfection.
Stability Concerns: The approach provides pathways to ensure stability using ENMPC in RL, turning NMPC's mature theoretical background to benefit modern RL environments.

Implications and Future Directions

The combined use of ENMPC and RL transcends some traditional challenges, suggesting pathways for integrating robust control and adaptive learning to enhance practical applications in various domains, including the process industry and robotics. As RL methods, particularly data-driven ones, become essential in dynamic environments, NMPC frameworks offer formal structure and guarantees that add significant value.

Future work would ideally examine refined RL methodologies tailored specifically with NMPC components, enhancing computational and optimization strategies further. Additionally, exploration into more sophisticated NMPC parameter spaces and cost structures could yield even richer results, aligning the theory more deeply with practical implementations across broader industrial applications.

In conclusion, this paper marks an important step in connecting advanced model predictive control with reinforcement learning, promising various enhancements in applied control paradigms.