Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FOSP: Fine-tuning Offline Safe Policy through World Models (2407.04942v1)

Published 6 Jul 2024 in cs.RO and cs.LG

Abstract: Model-based Reinforcement Learning (RL) has shown its high training efficiency and capability of handling high-dimensional tasks. Regarding safety issues, safe model-based RL can achieve nearly zero-cost performance and effectively manage the trade-off between performance and safety. Nevertheless, prior works still pose safety challenges due to the online exploration in real-world deployment. To address this, some offline RL methods have emerged as solutions, which learn from a static dataset in a safe way by avoiding interactions with the environment. In this paper, we aim to further enhance safety during the deployment stage for vision-based robotic tasks by fine-tuning an offline-trained policy. We incorporate in-sample optimization, model-based policy expansion, and reachability guidance to construct a safe offline-to-online framework. Moreover, our method proves to improve the generalization of offline policy in unseen safety-constrained scenarios. Finally, the efficiency of our method is validated on simulation benchmarks with five vision-only tasks and a real robot by solving some deployment problems using limited data.

Summary

We haven't generated a summary for this paper yet.