TRC: Trust Region Conditional Value at Risk for Safe Reinforcement Learning (2312.00344v1)

Published 1 Dec 2023 in cs.RO and cs.LG

Abstract: As safety is of paramount importance in robotics, reinforcement learning that reflects safety, called safe RL, has been studied extensively. In safe RL, we aim to find a policy which maximizes the desired return while satisfying the defined safety constraints. There are various types of constraints, among which constraints on conditional value at risk (CVaR) effectively lower the probability of failures caused by high costs since CVaR is a conditional expectation obtained above a certain percentile. In this paper, we propose a trust region-based safe RL method with CVaR constraints, called TRC. We first derive the upper bound on CVaR and then approximate the upper bound in a differentiable form in a trust region. Using this approximation, a subproblem to get policy gradients is formulated, and policies are trained by iteratively solving the subproblem. TRC is evaluated through safe navigation tasks in simulations with various robots and a sim-to-real environment with a Jackal robot from Clearpath. Compared to other safe RL methods, the performance is improved by 1.93 times while the constraints are satisfied in all experiments.

Citations (15)

View on Semantic Scholar

Summary

The paper introduces TRC, which combines a trust region approach with CVaR to develop safe reinforcement learning policies.
It formulates a differentiable subproblem with a Gaussian cost approximation, enabling efficient and constrained policy updates.
Experimental results demonstrate that TRC improves performance by up to 1.93X while reliably meeting safety constraints in simulations and real-robot tasks.

Introduction to Safe Reinforcement Learning

Reinforcement learning (RL) is a powerful tool in robotics, capable of enabling robots to perform complex tasks through trial and error. However, as robots increasingly operate in environments alongside humans or in safety-critical situations, guaranteeing safety becomes crucial. Safe RL is an area that focuses on learning policies – the robot's strategy of action – such that certain safety constraints are always met. One common approach is through constrained Markov decision processes (CMDPs), where constraints are added to the learning process, usually in terms of expected cost functions related to safety.

CVaR in Safe RL

Conditional Value at Risk (CVaR) is utilized in finance for risk assessment and is now being used in the context of safe RL. CVaR provides a way to focus on the tail-end of the cost distribution—those rare, but potentially catastrophic outcomes—by taking conditional expectations beyond a risk threshold. This proves particularly useful in differentiating between policies that, while having the same mean cost, carry different levels of risk. Thus, CVaR can guide the learning process towards policies that are less likely to result in unsafe outcomes.

Trust Region-Based Method for CVaR Constraints

The paper introduces a new method that builds on the trust region approach. It formulates a subproblem to efficiently and iteratively improve the policy within a trust region – a set where the policy is assured to be close to the current best estimate and thus remains relatively safe. This is achieved by first assuming a Gaussian distribution for costs and then deriving an upper bound on CVaR which can translate the problem into a differentiable form. The policy is then updated by solving this differentiable subproblem using constrained optimization.

Validation and Results

Experiments were conducted to validate the performance of the introduced method, denoted as TRC. The method was tested in simulation tasks with various robotic platforms. Results demonstrated that TRC not only improved performance significantly—up to 1.93 times better than other safe RL methods—but also consistently met safety constraints. Moreover, the method was transitioned from simulation to a real robot navigation task, where it maintained its performance and constraint satisfaction, indicating its practical applicability.

In summary, the paper presents TRC, a safe RL method which navigates the challenges of ensuring robotic safety while improving performance. Through the effective application of CVaR in a trust region framework and iterative policy updates, TRC stands out as a promising approach for engineers and researchers aiming to deploy robots in environments where safety cannot be compromised.

PDF Markdown