Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FOSP: Fine-tuning Offline Safe Policy through World Models (2407.04942v1)

Published 6 Jul 2024 in cs.RO and cs.LG

Abstract: Model-based Reinforcement Learning (RL) has shown its high training efficiency and capability of handling high-dimensional tasks. Regarding safety issues, safe model-based RL can achieve nearly zero-cost performance and effectively manage the trade-off between performance and safety. Nevertheless, prior works still pose safety challenges due to the online exploration in real-world deployment. To address this, some offline RL methods have emerged as solutions, which learn from a static dataset in a safe way by avoiding interactions with the environment. In this paper, we aim to further enhance safety during the deployment stage for vision-based robotic tasks by fine-tuning an offline-trained policy. We incorporate in-sample optimization, model-based policy expansion, and reachability guidance to construct a safe offline-to-online framework. Moreover, our method proves to improve the generalization of offline policy in unseen safety-constrained scenarios. Finally, the efficiency of our method is validated on simulation benchmarks with five vision-only tasks and a real robot by solving some deployment problems using limited data.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Chenyang Cao (5 papers)
  2. Yucheng Xin (2 papers)
  3. Silang Wu (2 papers)
  4. Longxiang He (3 papers)
  5. Zichen Yan (3 papers)
  6. Junbo Tan (10 papers)
  7. Xueqian Wang (99 papers)

Summary

We haven't generated a summary for this paper yet.