Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model (2401.10700v1)

Published 19 Jan 2024 in cs.LG, cs.AI, and cs.RO

Abstract: Safe offline RL is a promising way to bypass risky online interactions towards safe policy learning. Most existing methods only enforce soft constraints, i.e., constraining safety violations in expectation below thresholds predetermined. This can lead to potentially unsafe outcomes, thus unacceptable in safety-critical scenarios. An alternative is to enforce the hard constraint of zero violation. However, this can be challenging in offline setting, as it needs to strike the right balance among three highly intricate and correlated aspects: safety constraint satisfaction, reward maximization, and behavior regularization imposed by offline datasets. Interestingly, we discover that via reachability analysis of safe-control theory, the hard safety constraint can be equivalently translated to identifying the largest feasible region given the offline dataset. This seamlessly converts the original trilogy problem to a feasibility-dependent objective, i.e., maximizing reward value within the feasible region while minimizing safety risks in the infeasible region. Inspired by these, we propose FISOR (FeasIbility-guided Safe Offline RL), which allows safety constraint adherence, reward maximization, and offline policy learning to be realized via three decoupled processes, while offering strong safety performance and stability. In FISOR, the optimal policy for the translated optimization problem can be derived in a special form of weighted behavior cloning. Thus, we propose a novel energy-guided diffusion model that does not require training a complicated time-dependent classifier to extract the policy, greatly simplifying the training. We compare FISOR against baselines on DSRL benchmark for safe offline RL. Evaluation results show that FISOR is the only method that can guarantee safety satisfaction in all tasks, while achieving top returns in most tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (85)
  1. Constrained policy optimization. In International conference on machine learning, pp.  22–31. PMLR, 2017.
  2. Is conditional generative modeling all you need for decision making? In The Eleventh International Conference on Learning Representations, 2023.
  3. Eitan Altman. Constrained Markov decision processes. Routledge, 2021.
  4. Control barrier functions: Theory and applications. In 2019 18th European control conference (ECC), pp. 3420–3431. IEEE, 2019.
  5. Leemon Baird. Residual algorithms: Reinforcement learning with function approximation. In Machine Learning Proceedings 1995, pp.  30–37. Elsevier, 1995.
  6. An efficient reachability-based framework for provably safe autonomous navigation in unknown environments. In 2019 IEEE 58th Conference on Decision and Control (CDC), pp.  1758–1765. IEEE, 2019.
  7. Hamilton-jacobi reachability: A brief overview and recent advances. In 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp.  2242–2253. IEEE, 2017.
  8. Jax: composable transformations of python+ numpy programs. 2018.
  9. Offline reinforcement learning via high-fidelity generative behavior modeling. In The Eleventh International Conference on Learning Representations, 2023.
  10. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021.
  11. Look beneath the surface: Exploiting fundamental symmetry for sample-efficient offline rl. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  12. Reinforcement learning for safety-critical control under model uncertainty, using control lyapunov functions and control barrier functions. In Robotics: Science and Systems (RSS), 2020.
  13. Robust control barrier–value functions for safety-critical control. In 2021 60th IEEE Conference on Decision and Control (CDC), pp.  6814–6821. IEEE, 2021.
  14. Risk-constrained reinforcement learning with percentile risk criteria. The Journal of Machine Learning Research, 18(1):6070–6120, 2017.
  15. Safe exploration in continuous action spaces. arXiv preprint arXiv:1801.08757, 2018.
  16. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
  17. Natural policy gradient primal-dual method for constrained markov decision processes. Advances in Neural Information Processing Systems, 33:8378–8390, 2020.
  18. Bridging hamilton-jacobi safety analysis and reinforcement learning. In 2019 International Conference on Robotics and Automation (ICRA), pp.  8550–8556. IEEE, 2019.
  19. A minimalist approach to offline reinforcement learning. Advances in neural information processing systems, 34:20132–20145, 2021.
  20. Addressing function approximation error in actor-critic methods. In International conference on machine learning, pp. 1587–1596. PMLR, 2018.
  21. Off-policy deep reinforcement learning without exploration. In International conference on machine learning, pp. 2052–2062. PMLR, 2019.
  22. Extreme q-learning: Maxent rl without entropy. arXiv preprint arXiv:2301.02328, 2023.
  23. Sven Gronauer. Bullet-safety-gym: A framework for constrained reinforcement learning. 2022.
  24. A review of safe reinforcement learning: Methods, theory and applications. arXiv preprint arXiv:2205.10330, 2022.
  25. Idql: Implicit q-learning as an actor-critic method with diffusion policies. arXiv preprint arXiv:2304.10573, 2023.
  26. Classifier-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021.
  27. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  28. Planning with diffusion for flexible behavior synthesis. In International Conference on Machine Learning, pp. 9902–9915. PMLR, 2022.
  29. Omnisafe: An infrastructure for accelerating safe reinforcement learning research. arXiv preprint arXiv:2305.09304, 2023.
  30. Approximately optimal approximate reinforcement learning. In Proceedings of the Nineteenth International Conference on Machine Learning, pp.  267–274, 2002.
  31. Efficient diffusion policies for offline reinforcement learning. arXiv preprint arXiv:2305.20081, 2023.
  32. Safety-critical coordination of legged robots via layered controllers and forward reachable set based control barrier functions. arXiv preprint arXiv:2312.08689, 2023.
  33. Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 23(6):4909–4926, 2021.
  34. Offline reinforcement learning with fisher divergence critic regularization. In International Conference on Machine Learning, pp. 5774–5783. PMLR, 2021.
  35. Offline reinforcement learning with implicit q-learning. In International Conference on Learning Representations, 2022.
  36. Stabilizing off-policy q-learning via bootstrapping error reduction. Advances in Neural Information Processing Systems, 32, 2019.
  37. Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems, 33:1179–1191, 2020.
  38. Batch policy learning under constraints. In International Conference on Machine Learning, pp. 3703–3712. PMLR, 2019.
  39. A data-driven method for safety-critical control: Designing control barrier functions from state constraints. arXiv preprint arXiv:2312.07786, 2023.
  40. Optidice: Offline policy optimization via stationary distribution correction estimation. In International Conference on Machine Learning, pp. 6120–6130. PMLR, 2021.
  41. Coptidice: Offline constrained reinforcement learning via stationary distribution correction estimation. In International Conference on Learning Representations, 2022.
  42. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
  43. Proto: Iterative policy regularized offline-to-online reinforcement learning. arXiv preprint arXiv:2305.15669, 2023a.
  44. When data geometry meets deep function: Generalizing offline reinforcement learning. In The Eleventh International Conference on Learning Representations, 2023b.
  45. Metadrive: Composing diverse driving scenarios for generalizable reinforcement learning. IEEE transactions on pattern analysis and machine intelligence, 45(3):3461–3475, 2022.
  46. Shengbo Eben Li. Reinforcement learning for sequential decision and optimal control. Springer, 2023.
  47. Safe offline reinforcement learning with real-time budget constraints. In International Conference on Machine Learning, 2023.
  48. Datasets and benchmarks for offline safe reinforcement learning. arXiv preprint arXiv:2306.09303, 2023a.
  49. Constrained decision transformer for offline safe reinforcement learning. In International Conference on Machine Learning, 2023b.
  50. Contrastive energy prediction for exact energy-guided diffusion sampling in offline reinforcement learning. In International Conference on Machine Learning. PMLR, 2023.
  51. Mildly conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems, 35:1711–1724, 2022.
  52. Feasible actor-critic: Constrained reinforcement learning for ensuring statewise safety. arXiv preprint arXiv:2105.10682, 2021a.
  53. Joint synthesis of safety certificate and safe control policy using constrained reinforcement learning. In Learning for Dynamics and Control Conference, pp. 97–109. PMLR, 2022.
  54. Offline reinforcement learning with value-based episodic memory. arXiv preprint arXiv:2110.09796, 2021b.
  55. Revealing the mystery of distribution correction estimation via orthogonal-gradient update. In The Twelfth International Conference on Learning Representations, 2023.
  56. When to trust your simulator: Dynamics-aware hybrid offline-and-online reinforcement learning. Advances in Neural Information Processing Systems, 35:36599–36612, 2022.
  57. Benchmarking Safe Exploration in Deep Reinforcement Learning. 2019.
  58. Learning control barrier functions from expert demonstrations. In 2020 59th IEEE Conference on Decision and Control (CDC), pp.  3717–3724. IEEE, 2020.
  59. Generative adversarial networks (gans) challenges, solutions, and future directions. ACM Computing Surveys (CSUR), 54(3):1–42, 2021.
  60. Trust region policy optimization. In International conference on machine learning, pp. 1889–1897. PMLR, 2015.
  61. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
  62. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021.
  63. Sauté rl: Almost surely safe reinforcement learning using state augmentation. In International Conference on Machine Learning, pp. 20423–20443. PMLR, 2022.
  64. Responsive safety in reinforcement learning by pid lagrangian methods. In International Conference on Machine Learning, pp. 9133–9143. PMLR, 2020.
  65. Reward constrained policy optimization. In International Conference on Learning Representations, 2018.
  66. Recovery rl: Safe reinforcement learning with learned recovery zones. IEEE Robotics and Automation Letters, 6(3):4915–4922, 2021.
  67. Data-driven safety filters: Hamilton-jacobi reachability, control barrier functions, and predictive methods for uncertain systems. IEEE Control Systems Magazine, 43(5):137–177, 2023.
  68. Safe reinforcement learning in constrained markov decision processes. In International Conference on Machine Learning, pp. 9797–9806. PMLR, 2020.
  69. Offline multi-agent reinforcement learning with implicit global-to-local value regularization. In Thirty-seventh Conference on Neural Information Processing Systems, 2023a.
  70. Enforcing hard constraints with soft barriers: Safe reinforcement learning in unknown stochastic environments. In International Conference on Machine Learning, pp. 36593–36604. PMLR, 2023b.
  71. Diffusion policies as an expressive policy class for offline reinforcement learning. In The Eleventh International Conference on Learning Representations, 2023c.
  72. The in-sample softmax for offline reinforcement learning. In The Eleventh International Conference on Learning Representations, 2023.
  73. A policy-guided imitation approach for offline reinforcement learning. Advances in Neural Information Processing Systems, 35:4085–4098, 2022a.
  74. Constraints penalized q-learning for safe offline reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp.  8753–8760, 2022b.
  75. Offline RL with no OOD actions: In-sample learning via implicit value regularization. In The Eleventh International Conference on Learning Representations, 2023.
  76. Rorl: Robust offline reinforcement learning via conservative smoothing. Advances in Neural Information Processing Systems, 35:23851–23866, 2022.
  77. Projection-based constrained policy optimization. In International Conference on Learning Representations, 2020.
  78. Model-free safe reinforcement learning through neural barrier certificate. IEEE Robotics and Automation Letters, 8(3):1295–1302, 2023a.
  79. Feasible policy iteration. arXiv preprint arXiv:2304.08845, 2023b.
  80. Reachability constrained reinforcement learning. In International Conference on Machine Learning, pp. 25636–25655. PMLR, 2022a.
  81. Safe model-based reinforcement learning with an uncertainty-aware reachability certificate. arXiv preprint arXiv:2210.07553, 2022b.
  82. Deepthermal: Combustion optimization for thermal power generating units using offline reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp.  4680–4688, 2022.
  83. First order constrained optimization in policy space. Advances in Neural Information Processing Systems, 33:15338–15349, 2020.
  84. Model-free safe control for zero-violation reinforcement learning. In 5th Annual Conference on Robot Learning, 2021.
  85. State-wise safe reinforcement learning: A survey. arXiv preprint arXiv:2302.03122, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yinan Zheng (13 papers)
  2. Jianxiong Li (31 papers)
  3. Dongjie Yu (9 papers)
  4. Yujie Yang (29 papers)
  5. Shengbo Eben Li (98 papers)
  6. Xianyuan Zhan (47 papers)
  7. Jingjing Liu (139 papers)
Citations (16)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets