Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Distributionally Robust Reinforcement Learning (1902.08708v2)

Published 23 Feb 2019 in stat.ML and cs.LG

Abstract: Real-world applications require RL algorithms to act safely. During learning process, it is likely that the agent executes sub-optimal actions that may lead to unsafe/poor states of the system. Exploration is particularly brittle in high-dimensional state/action space due to increased number of low-performing actions. In this work, we consider risk-averse exploration in approximate RL setting. To ensure safety during learning, we propose the distributionally robust policy iteration scheme that provides lower bound guarantee on state-values. Our approach induces a dynamic level of risk to prevent poor decisions and yet preserves the convergence to the optimal policy. Our formulation results in a efficient algorithm that accounts for a simple re-weighting of policy actions in the standard policy iteration scheme. We extend our approach to continuous state/action space and present a practical algorithm, distributionally robust soft actor-critic, that implements a different exploration strategy: it acts conservatively at short-term and it explores optimistically in a long-run. We provide promising experimental results on continuous control tasks.

Citations (53)

Summary

We haven't generated a summary for this paper yet.