Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OASIS: Conditional Distribution Shaping for Offline Safe Reinforcement Learning (2407.14653v1)

Published 19 Jul 2024 in cs.LG

Abstract: Offline safe reinforcement learning (RL) aims to train a policy that satisfies constraints using a pre-collected dataset. Most current methods struggle with the mismatch between imperfect demonstrations and the desired safe and rewarding performance. In this paper, we introduce OASIS (cOnditionAl diStributIon Shaping), a new paradigm in offline safe RL designed to overcome these critical limitations. OASIS utilizes a conditional diffusion model to synthesize offline datasets, thus shaping the data distribution toward a beneficial target domain. Our approach makes compliance with safety constraints through effective data utilization and regularization techniques to benefit offline safe RL training. Comprehensive evaluations on public benchmarks and varying datasets showcase OASIS's superiority in benefiting offline safe RL agents to achieve high-reward behavior while satisfying the safety constraints, outperforming established baselines. Furthermore, OASIS exhibits high data efficiency and robustness, making it suitable for real-world applications, particularly in tasks where safety is imperative and high-quality demonstrations are scarce.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Yihang Yao (14 papers)
  2. Zhepeng Cen (17 papers)
  3. Wenhao Ding (43 papers)
  4. Haohong Lin (14 papers)
  5. Shiqi Liu (31 papers)
  6. Tingnan Zhang (53 papers)
  7. Wenhao Yu (139 papers)
  8. Ding Zhao (172 papers)

Summary

We haven't generated a summary for this paper yet.