Safe Reinforcement Learning Using Robust MPC (1906.04005v2)

Published 10 Jun 2019 in cs.SY and eess.SY

Abstract: Reinforcement Learning (RL) has recently impressed the world with stunning results in various applications. While the potential of RL is now well-established, many critical aspects still need to be tackled, including safety and stability issues. These issues, while partially neglected by the RL community, are central to the control community which has been widely investigating them. Model Predictive Control (MPC) is one of the most successful control techniques because, among others, of its ability to provide such guarantees even for uncertain constrained systems. Since MPC is an optimization-based technique, optimality has also often been claimed. Unfortunately, the performance of MPC is highly dependent on the accuracy of the model used for predictions. In this paper, we propose to combine RL and MPC in order to exploit the advantages of both and, therefore, obtain a controller which is optimal and safe. We illustrate the results with a numerical example in simulations.

Citations (216)

View on Semantic Scholar

Summary

The paper introduces a hybrid RL-MPC framework that combines reinforcement learning with model predictive control to guarantee safety and stability.
The paper develops a sample-based safe design constraint using historical data and set membership methods to robustly approximate state transitions.
The paper demonstrates its approach with numerical studies, showing reduced conservatism and improved efficiency in both linear and nonlinear system applications.

Safe Reinforcement Learning Using Robust MPC

The paper "Safe Reinforcement Learning Using Robust MPC" by Mario Zanon and Sebastien Gros proposes a novel hybrid approach that integrates robust Model Predictive Control (MPC) with Reinforcement Learning (RL) to ensure both optimal and safe control policies. This work addresses the critical challenges of safety and stability in RL applications, an area of growing interest and importance as RL continues to produce impressive results in various fields. Although RL excels in tasks like playing complex games or controlling robotic systems, ensuring safety, especially in critical applications, remains a significant challenge.

Key Contributions and Methodology

Integration of RL and MPC: The authors propose a framework that synthesizes the advantages of RL and robust MPC. While RL is effective in handling complex decision-making tasks by learning from interactions, it often lacks guaranteed performance in terms of stability and safety. On the other hand, MPC, a powerful control strategy, inherently addresses safety and stability issues by optimizing control policies based on predictive models, even under system uncertainties. By combining these methodologies, the paper aims to derive controllers that are both optimal with respect to a desired cost and safe concerning operational constraints.
Sample-based Safe Design Constraint (SDC): A significant part of the research focuses on developing a sample-based formulation for Safe Design Constraint (SDC) from historical data. The SDC is essential for the robust identification of dispersion sets for control purposes. The authors employ set membership methods to construct an outer approximation of the state transition distributions, which is crucial in maintaining safety during the learning process.
Robust MPC Formulation: The robust MPC formulation presented in the paper involves a tube-based approach, which ensures that the uncertainty and external disturbances are accounted for by appropriately tightening constraints. This method provides a safety-guaranteed control action under the assumption that the uncertainties can be modeled as a bounded set, which is updated as new data comes in.
Efficient Data Management: To handle the potentially vast amounts of data generated during RL, the paper introduces techniques for efficient data management by leveraging the model's predictions. This includes methods for data compression using the convex hull of observed disturbances, which optimally reduces the dataset while retaining essential characteristics for maintaining set membership constraints.
Algorithmic Implementation: The authors propose an RL-MPC algorithmic structure where reinforcement learning updates are guided by the constraints defined by robust MPC. This involves iterative parameter updates subject to maintaining feasibility and satisfying SDC to ensure that safety isn't compromised during either exploration or exploitation phases.

Numerical Results and Implications

The paper presents numerical results to illustrate the effectiveness of the proposed approach. Two case studies are discussed: a straightforward linear system and a complex nonlinear evaporation process from the chemical industry. The results demonstrate how the RL-MPC framework adeptly adapts the control policy to reduce conservatism in constraint satisfaction, thus enhancing operational efficiency while maintaining robustness.

Implications for Future Research

This work opens several avenues for future research. While robust linear MPC has been successfully integrated here, extending such frameworks to more complex nonlinear dynamics presents opportunities for further innovation. Moreover, adopting or developing novel learning algorithms that could account for constraints natively might significantly enhance the application scope of safe RL systems.

Additionally, while practical feasibility and implementation of such hybrid RL-MPC systems are established, theoretical guarantees for convergence and safety in broader contexts need to be explored. Such developments could involve consideration of more sophisticated uncertainty models and adaptive parameter tuning strategies.

In conclusion, merging RL with robust MPC presents a promising pathway towards developing control systems that are both intelligent and inherently safe, a decisive step for deploying RL in real-world safety-critical systems. This paper contributes significantly by offering a structured approach to integrating these advanced methodologies within a cohesive framework.

PDF Markdown