Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Datasets and Benchmarks for Offline Safe Reinforcement Learning (2306.09303v2)

Published 15 Jun 2023 in cs.LG, cs.AI, and cs.RO

Abstract: This paper presents a comprehensive benchmarking suite tailored to offline safe reinforcement learning (RL) challenges, aiming to foster progress in the development and evaluation of safe learning algorithms in both the training and deployment phases. Our benchmark suite contains three packages: 1) expertly crafted safe policies, 2) D4RL-styled datasets along with environment wrappers, and 3) high-quality offline safe RL baseline implementations. We feature a methodical data collection pipeline powered by advanced safe RL algorithms, which facilitates the generation of diverse datasets across 38 popular safe RL tasks, from robot control to autonomous driving. We further introduce an array of data post-processing filters, capable of modifying each dataset's diversity, thereby simulating various data collection conditions. Additionally, we provide elegant and extensible implementations of prevalent offline safe RL algorithms to accelerate research in this area. Through extensive experiments with over 50000 CPU and 800 GPU hours of computations, we evaluate and compare the performance of these baseline algorithms on the collected datasets, offering insights into their strengths, limitations, and potential areas of improvement. Our benchmarking framework serves as a valuable resource for researchers and practitioners, facilitating the development of more robust and reliable offline safe RL solutions in safety-critical applications. The benchmark website is available at \url{www.offline-saferl.org}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
  2. How to train your robot with deep reinforcement learning: lessons we have learned. The International Journal of Robotics Research, 40(4-5):698–721, 2021.
  3. A review of safe reinforcement learning: Methods, theory and applications. arXiv preprint arXiv:2205.10330, 2022.
  4. Trustworthy reinforcement learning against intrinsic vulnerabilities: Robustness, safety, and generalizability. arXiv preprint arXiv:2209.08025, 2022.
  5. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
  6. D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
  7. d3rlpy: An offline deep reinforcement learning library. The Journal of Machine Learning Research, 23(1):14205–14224, 2022.
  8. Benchmarking safe exploration in deep reinforcement learning. arXiv preprint arXiv:1910.01708, 7, 2019.
  9. Omnisafe: An infrastructure for accelerating safe reinforcement learning research. arXiv preprint arXiv:2305.09304, 2023.
  10. Sven Gronauer. Bullet-safety-gym: Aframework for constrained reinforcement learning. 2022.
  11. Metadrive: Composing diverse driving scenarios for generalizable reinforcement learning. IEEE transactions on pattern analysis and machine intelligence, 2022.
  12. Constraints penalized q-learning for safe offline reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 8753–8760, 2022.
  13. Coptidice: Offline constrained reinforcement learning via stationary distribution correction estimation. arXiv preprint arXiv:2204.08957, 2022.
  14. Constrained decision transformer for offline safe reinforcement learning. arXiv preprint arXiv:2302.07351, 2023.
  15. Sauté rl: Almost surely safe reinforcement learning using state augmentation. In International Conference on Machine Learning, pages 20423–20443. PMLR, 2022.
  16. Wcsac: Worst-case soft actor critic for safety-constrained reinforcement learning. In AAAI, pages 10639–10646, 2021.
  17. Constrained variational policy optimization for safe reinforcement learning. In International Conference on Machine Learning, pages 13644–13668. PMLR, 2022.
  18. Risk-constrained reinforcement learning with percentile risk criteria. The Journal of Machine Learning Research, 18(1):6070–6120, 2017.
  19. A primal-dual approach to constrained markov decision processes. arXiv preprint arXiv:2101.10895, 2021.
  20. Model-free safe control for zero-violation reinforcement learning. In 5th Annual Conference on Robot Learning, 2021.
  21. Learning barrier certificates: Towards safe reinforcement learning with zero training-time violations. Advances in Neural Information Processing Systems, 34, 2021.
  22. Safe exploration in continuous action spaces. arXiv preprint arXiv:1801.08757, 2018.
  23. Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 5, 2021.
  24. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6, 2005.
  25. Diagnosing bottlenecks in deep q-learning algorithms. In International Conference on Machine Learning, pages 2021–2030. PMLR, 2019.
  26. Offline rl without off-policy evaluation. Advances in neural information processing systems, 34:4933–4946, 2021.
  27. Off-policy deep reinforcement learning without exploration. In International conference on machine learning, pages 2052–2062. PMLR, 2019.
  28. Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems, 33:1179–1191, 2020.
  29. Behavior regularized offline reinforcement learning. arXiv preprint arXiv:1911.11361, 2019.
  30. Advantage-weighted regression: Simple and scalable off-policy reinforcement learning. arXiv preprint arXiv:1910.00177, 2019.
  31. Offline reinforcement learning with fisher divergence critic regularization. In International Conference on Machine Learning, pages 5774–5783. PMLR, 2021.
  32. Dualdice: Behavior-agnostic estimation of discounted stationary distribution corrections. Advances in Neural Information Processing Systems, 32, 2019.
  33. Rl unplugged: A suite of benchmarks for offline reinforcement learning. Advances in Neural Information Processing Systems, 33:7248–7259, 2020.
  34. CORL: Research-oriented deep offline reinforcement learning library. In 3rd Offline RL Workshop: Offline RL as a ”Launchpad”, 2022.
  35. Batch policy learning under constraints. In International Conference on Machine Learning, pages 3703–3712. PMLR, 2019.
  36. Constrained offline policy optimization. In International Conference on Machine Learning, pages 17801–17810. PMLR, 2022.
  37. Saformer: A conditional sequence modeling approach to offline safe reinforcement learning. arXiv preprint arXiv:2301.12203, 2023.
  38. Eitan Altman. Constrained markov decision processes with total cost criteria: Lagrangian approach and dual linear program. Mathematical methods of operations research, 48(3):387–417, 1998.
  39. Constrained policy optimization. In International Conference on Machine Learning, pages 22–31. PMLR, 2017.
  40. First order constrained optimization in policy space. Advances in Neural Information Processing Systems, 2020.
  41. On the robustness of safe reinforcement learning under observational perturbations. arXiv preprint arXiv:2205.14691, 2022.
  42. Responsive safety in reinforcement learning by pid lagrangian methods. In International Conference on Machine Learning, pages 9133–9143. PMLR, 2020.
  43. The panda3d graphics engine. Computer, 37(10):112–114, 2004.
  44. Decision transformer: Reinforcement learning via sequence modeling. arXiv preprint arXiv:2106.01345, 2021.
  45. Optidice: Offline policy optimization via stationary distribution correction estimation. In International Conference on Machine Learning, pages 6120–6130. PMLR, 2021.
  46. Stabilizing off-policy q-learning via bootstrapping error reduction. Advances in Neural Information Processing Systems, 32, 2019.
  47. The relationship between precision-recall and roc curves. In Proceedings of the 23rd international conference on Machine learning, pages 233–240, 2006.
  48. Context-aware safe reinforcement learning for non-stationary environments. arXiv preprint arXiv:2101.00531, 2021.
  49. Concrete problems in ai safety. arXiv preprint arXiv:1606.06565, 2016.
  50. Programmatically interpretable reinforcement learning. In International Conference on Machine Learning, pages 5045–5054. PMLR, 2018.
  51. Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems, 34:12686–12699, 2021.
  52. S4rl: Surprisingly simple self-supervision for offline reinforcement learning in robotics. In Conference on Robot Learning, pages 907–917. PMLR, 2022.
  53. Constrained policy optimization via bayesian world models. arXiv preprint arXiv:2201.09802, 2022.
  54. Pre-training for robots: Offline rl enables learning new tasks from a handful of trials. arXiv preprint arXiv:2210.05178, 2022.
  55. Real-world robot learning with masked visual pre-training. In Conference on Robot Learning, pages 416–426. PMLR, 2023.
  56. Transfer learning in deep reinforcement learning: A survey. arXiv preprint arXiv:2009.07888, 2020.
  57. Fairness in reinforcement learning. In International conference on machine learning, pages 1617–1626. PMLR, 2017.
  58. Preventing undesirable behavior of intelligent machines. Science, 366(6468):999–1004, 2019.
  59. Reinforcement learning with stepwise fairness constraints. arXiv preprint arXiv:2211.03994, 2022.
  60. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  61. Tianshou: A highly modularized deep reinforcement learning library. arXiv preprint arXiv:2107.14171, 2021.
  62. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  63. Trust region policy optimization. In International conference on machine learning, pages 1889–1897. PMLR, 2015.
  64. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
  65. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018.
  66. Cleanrl: High-quality single-file implementations of deep reinforcement learning algorithms. The Journal of Machine Learning Research, 23(1):12585–12602, 2022.
  67. General lane-changing model mobil for car-following models. Transportation Research Record, 1999(1):86–94, 2007.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Zuxin Liu (43 papers)
  2. Zijian Guo (50 papers)
  3. Haohong Lin (14 papers)
  4. Yihang Yao (14 papers)
  5. Jiacheng Zhu (54 papers)
  6. Zhepeng Cen (17 papers)
  7. Hanjiang Hu (23 papers)
  8. Wenhao Yu (139 papers)
  9. Tingnan Zhang (53 papers)
  10. Jie Tan (85 papers)
  11. Ding Zhao (172 papers)
Citations (25)

Summary

We haven't generated a summary for this paper yet.