Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
123 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
51 tokens/sec
2000 character limit reached

Feasibility Consistent Representation Learning for Safe Reinforcement Learning (2405.11718v2)

Published 20 May 2024 in cs.LG

Abstract: In the field of safe reinforcement learning (RL), finding a balance between satisfying safety constraints and optimizing reward performance presents a significant challenge. A key obstacle in this endeavor is the estimation of safety constraints, which is typically more difficult than estimating a reward metric due to the sparse nature of the constraint signals. To address this issue, we introduce a novel framework named Feasibility Consistent Safe Reinforcement Learning (FCSRL). This framework combines representation learning with feasibility-oriented objectives to identify and extract safety-related information from the raw state for safe RL. Leveraging self-supervised learning techniques and a more learnable safety metric, our approach enhances the policy learning and constraint estimation. Empirical evaluations across a range of vector-state and image-based tasks demonstrate that our method is capable of learning a better safety-aware embedding and achieving superior performance than previous representation learning baselines.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (75)
  1. State abstractions for lifelong reinforcement learning. In International Conference on Machine Learning, pp.  10–19. PMLR, 2018.
  2. Constrained policy optimization. In International conference on machine learning, pp.  22–31. PMLR, 2017.
  3. Altman, E. Constrained Markov decision processes: stochastic modeling. Routledge, 1999.
  4. Hindsight experience replay. Advances in neural information processing systems, 30, 2017.
  5. Constrained policy optimization via bayesian world models. arXiv preprint arXiv:2201.09802, 2022.
  6. Hamilton-jacobi reachability: A brief overview and recent advances. In 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp.  2242–2253. IEEE, 2017.
  7. A distributional perspective on reinforcement learning. In International conference on machine learning, pp.  449–458. PMLR, 2017.
  8. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8):1798–1828, 2013.
  9. Safe model-based reinforcement learning with stability guarantees. Advances in neural information processing systems, 30, 2017.
  10. Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 5, 2021.
  11. Learning from sparse offline datasets via conservative density estimation. arXiv preprint arXiv:2401.08819, 2024.
  12. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pp.  1597–1607. PMLR, 2020.
  13. Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  15750–15758, 2021.
  14. Risk-constrained reinforcement learning with percentile risk criteria. Journal of Machine Learning Research, 18(167):1–51, 2018.
  15. Safe exploration in continuous action spaces. arXiv preprint arXiv:1801.08757, 2018.
  16. Natural policy gradient primal-dual method for constrained markov decision processes. Advances in Neural Information Processing Systems, 33:8378–8390, 2020.
  17. A survey on safety-critical driving scenario generation—a methodological perspective. IEEE Transactions on Intelligent Transportation Systems, 2023.
  18. Contrastive learning as goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems, 35:35603–35620, 2022.
  19. Value-aware loss function for model-based reinforcement learning. In Artificial Intelligence and Statistics, pp.  1486–1494. PMLR, 2017.
  20. Self-consistent models and values. Advances in Neural Information Processing Systems, 34:1111–1125, 2021.
  21. Deep spatial autoencoders for visuomotor learning. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pp.  512–519. IEEE, 2016.
  22. Bridging hamilton-jacobi safety analysis and reinforcement learning. In 2019 International Conference on Robotics and Automation (ICRA), pp.  8550–8556. IEEE, 2019.
  23. Addressing function approximation error in actor-critic methods. In International conference on machine learning, pp.  1587–1596. PMLR, 2018.
  24. For sale: State-action representation learning for deep reinforcement learning. arXiv preprint arXiv:2306.02451, 2023.
  25. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1):1437–1480, 2015.
  26. Deepmdp: Learning continuous latent space models for representation learning. In International Conference on Machine Learning, pp.  2170–2179. PMLR, 2019.
  27. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020.
  28. The value equivalence principle for model-based reinforcement learning. Advances in Neural Information Processing Systems, 33:5541–5552, 2020.
  29. A review of safe reinforcement learning: Methods, theory and applications. arXiv preprint arXiv:2205.10330, 2022.
  30. Learning latent dynamics for planning from pixels. In International conference on machine learning, pp.  2555–2565. PMLR, 2019.
  31. Mastering atari with discrete world models. arXiv preprint arXiv:2010.02193, 2020.
  32. Hare, J. Dealing with sparse rewards in reinforcement learning. arXiv preprint arXiv:1910.09281, 2019.
  33. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  9729–9738, 2020.
  34. Rainbow: Combining improvements in deep reinforcement learning. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
  35. Muesli: Combining improvements in policy optimization. In International conference on machine learning, pp.  4214–4226. PMLR, 2021.
  36. A constrained multi-objective reinforcement learning framework. In Conference on Robot Learning, pp.  883–893. PMLR, 2022.
  37. Safe dreamerv3: Safe reinforcement learning with world models. arXiv preprint arXiv:2307.07176, 2023.
  38. Safety-gymnasium: A unified safe reinforcement learning benchmark. arXiv preprint arXiv:2310.12567, 2023.
  39. Model-based reinforcement learning for atari. arXiv preprint arXiv:1903.00374, 2019.
  40. Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 23(6):4909–4926, 2021.
  41. Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. arXiv preprint arXiv:2004.13649, 2020.
  42. Curl: Contrastive unsupervised representations for reinforcement learning. In International Conference on Machine Learning, pp.  5639–5650. PMLR, 2020.
  43. State representation learning for control: An overview. Neural Networks, 108:379–392, 2018.
  44. Metadrive: Composing diverse driving scenarios for generalizable reinforcement learning. IEEE transactions on pattern analysis and machine intelligence, 2022.
  45. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
  46. Safety-aware causal representation for trustworthy offline reinforcement learning in autonomous driving. IEEE Robotics and Automation Letters, 2024.
  47. Return-based contrastive representation learning for reinforcement learning. arXiv preprint arXiv:2102.10960, 2021.
  48. Goal-conditioned reinforcement learning: Problems and solutions. arXiv preprint arXiv:2201.08299, 2022a.
  49. Constrained variational policy optimization for safe reinforcement learning. In International Conference on Machine Learning, pp.  13644–13668. PMLR, 2022b.
  50. On the robustness of safe reinforcement learning under observational perturbations. arXiv preprint arXiv:2205.14691, 2022c.
  51. Datasets and benchmarks for offline safe reinforcement learning. arXiv preprint arXiv:2306.09303, 2023.
  52. Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
  53. Value prediction network. Advances in neural information processing systems, 30, 2017.
  54. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  55. Benchmarking safe exploration in deep reinforcement learning. arXiv preprint arXiv:1910.01708, 7, 2019.
  56. Learning by playing solving sparse reward tasks from scratch. In International conference on machine learning, pp.  4344–4353. PMLR, 2018.
  57. Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
  58. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  59. Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929, 2020.
  60. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
  61. Sauté rl: Almost surely safe reinforcement learning using state augmentation. In International Conference on Machine Learning, pp.  20423–20443. PMLR, 2022.
  62. Responsive safety in reinforcement learning by pid lagrangian methods. In International Conference on Machine Learning, pp.  9133–9143. PMLR, 2020.
  63. Safe exploration for optimization with gaussian processes. In International conference on machine learning, pp.  997–1005. PMLR, 2015.
  64. Reward constrained policy optimization. arXiv preprint arXiv:1805.11074, 2018.
  65. Safe reinforcement learning in constrained markov decision processes. In International Conference on Machine Learning, pp.  9797–9806. PMLR, 2020.
  66. Safe exploration and optimization of constrained mdps using gaussian processes. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
  67. Enforcing hard constraints with soft barriers: Safe reinforcement learning in unknown stochastic environments. In International Conference on Machine Learning, pp.  36593–36604. PMLR, 2023.
  68. Trustworthy reinforcement learning against intrinsic vulnerabilities: Robustness, safety, and generalizability. arXiv preprint arXiv:2209.08025, 2022.
  69. Representation matters: Offline pretraining for sequential decision making. In International Conference on Machine Learning, pp.  11784–11794. PMLR, 2021.
  70. Projection-based constrained policy optimization. arXiv preprint arXiv:2010.03152, 2020.
  71. Constraint-conditioned policy optimization for versatile safe reinforcement learning. arXiv preprint arXiv:2310.03718, 2023.
  72. Mastering atari games with limited data. Advances in Neural Information Processing Systems, 34:25476–25488, 2021.
  73. Reachability constrained reinforcement learning. In International Conference on Machine Learning, pp.  25636–25655. PMLR, 2022.
  74. Value-consistent representation learning for data-efficient reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp.  11069–11077, 2023.
  75. First order constrained optimization in policy space. Advances in Neural Information Processing Systems, 33:15338–15349, 2020.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.