Feasibility Consistent Representation Learning for Safe Reinforcement Learning (2405.11718v2)
Abstract: In the field of safe reinforcement learning (RL), finding a balance between satisfying safety constraints and optimizing reward performance presents a significant challenge. A key obstacle in this endeavor is the estimation of safety constraints, which is typically more difficult than estimating a reward metric due to the sparse nature of the constraint signals. To address this issue, we introduce a novel framework named Feasibility Consistent Safe Reinforcement Learning (FCSRL). This framework combines representation learning with feasibility-oriented objectives to identify and extract safety-related information from the raw state for safe RL. Leveraging self-supervised learning techniques and a more learnable safety metric, our approach enhances the policy learning and constraint estimation. Empirical evaluations across a range of vector-state and image-based tasks demonstrate that our method is capable of learning a better safety-aware embedding and achieving superior performance than previous representation learning baselines.
- State abstractions for lifelong reinforcement learning. In International Conference on Machine Learning, pp. 10–19. PMLR, 2018.
- Constrained policy optimization. In International conference on machine learning, pp. 22–31. PMLR, 2017.
- Altman, E. Constrained Markov decision processes: stochastic modeling. Routledge, 1999.
- Hindsight experience replay. Advances in neural information processing systems, 30, 2017.
- Constrained policy optimization via bayesian world models. arXiv preprint arXiv:2201.09802, 2022.
- Hamilton-jacobi reachability: A brief overview and recent advances. In 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp. 2242–2253. IEEE, 2017.
- A distributional perspective on reinforcement learning. In International conference on machine learning, pp. 449–458. PMLR, 2017.
- Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8):1798–1828, 2013.
- Safe model-based reinforcement learning with stability guarantees. Advances in neural information processing systems, 30, 2017.
- Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 5, 2021.
- Learning from sparse offline datasets via conservative density estimation. arXiv preprint arXiv:2401.08819, 2024.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pp. 1597–1607. PMLR, 2020.
- Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 15750–15758, 2021.
- Risk-constrained reinforcement learning with percentile risk criteria. Journal of Machine Learning Research, 18(167):1–51, 2018.
- Safe exploration in continuous action spaces. arXiv preprint arXiv:1801.08757, 2018.
- Natural policy gradient primal-dual method for constrained markov decision processes. Advances in Neural Information Processing Systems, 33:8378–8390, 2020.
- A survey on safety-critical driving scenario generation—a methodological perspective. IEEE Transactions on Intelligent Transportation Systems, 2023.
- Contrastive learning as goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems, 35:35603–35620, 2022.
- Value-aware loss function for model-based reinforcement learning. In Artificial Intelligence and Statistics, pp. 1486–1494. PMLR, 2017.
- Self-consistent models and values. Advances in Neural Information Processing Systems, 34:1111–1125, 2021.
- Deep spatial autoencoders for visuomotor learning. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 512–519. IEEE, 2016.
- Bridging hamilton-jacobi safety analysis and reinforcement learning. In 2019 International Conference on Robotics and Automation (ICRA), pp. 8550–8556. IEEE, 2019.
- Addressing function approximation error in actor-critic methods. In International conference on machine learning, pp. 1587–1596. PMLR, 2018.
- For sale: State-action representation learning for deep reinforcement learning. arXiv preprint arXiv:2306.02451, 2023.
- A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1):1437–1480, 2015.
- Deepmdp: Learning continuous latent space models for representation learning. In International Conference on Machine Learning, pp. 2170–2179. PMLR, 2019.
- Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020.
- The value equivalence principle for model-based reinforcement learning. Advances in Neural Information Processing Systems, 33:5541–5552, 2020.
- A review of safe reinforcement learning: Methods, theory and applications. arXiv preprint arXiv:2205.10330, 2022.
- Learning latent dynamics for planning from pixels. In International conference on machine learning, pp. 2555–2565. PMLR, 2019.
- Mastering atari with discrete world models. arXiv preprint arXiv:2010.02193, 2020.
- Hare, J. Dealing with sparse rewards in reinforcement learning. arXiv preprint arXiv:1910.09281, 2019.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9729–9738, 2020.
- Rainbow: Combining improvements in deep reinforcement learning. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
- Muesli: Combining improvements in policy optimization. In International conference on machine learning, pp. 4214–4226. PMLR, 2021.
- A constrained multi-objective reinforcement learning framework. In Conference on Robot Learning, pp. 883–893. PMLR, 2022.
- Safe dreamerv3: Safe reinforcement learning with world models. arXiv preprint arXiv:2307.07176, 2023.
- Safety-gymnasium: A unified safe reinforcement learning benchmark. arXiv preprint arXiv:2310.12567, 2023.
- Model-based reinforcement learning for atari. arXiv preprint arXiv:1903.00374, 2019.
- Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 23(6):4909–4926, 2021.
- Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. arXiv preprint arXiv:2004.13649, 2020.
- Curl: Contrastive unsupervised representations for reinforcement learning. In International Conference on Machine Learning, pp. 5639–5650. PMLR, 2020.
- State representation learning for control: An overview. Neural Networks, 108:379–392, 2018.
- Metadrive: Composing diverse driving scenarios for generalizable reinforcement learning. IEEE transactions on pattern analysis and machine intelligence, 2022.
- Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
- Safety-aware causal representation for trustworthy offline reinforcement learning in autonomous driving. IEEE Robotics and Automation Letters, 2024.
- Return-based contrastive representation learning for reinforcement learning. arXiv preprint arXiv:2102.10960, 2021.
- Goal-conditioned reinforcement learning: Problems and solutions. arXiv preprint arXiv:2201.08299, 2022a.
- Constrained variational policy optimization for safe reinforcement learning. In International Conference on Machine Learning, pp. 13644–13668. PMLR, 2022b.
- On the robustness of safe reinforcement learning under observational perturbations. arXiv preprint arXiv:2205.14691, 2022c.
- Datasets and benchmarks for offline safe reinforcement learning. arXiv preprint arXiv:2306.09303, 2023.
- Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
- Value prediction network. Advances in neural information processing systems, 30, 2017.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- Benchmarking safe exploration in deep reinforcement learning. arXiv preprint arXiv:1910.01708, 7, 2019.
- Learning by playing solving sparse reward tasks from scratch. In International conference on machine learning, pp. 4344–4353. PMLR, 2018.
- Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929, 2020.
- Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
- Sauté rl: Almost surely safe reinforcement learning using state augmentation. In International Conference on Machine Learning, pp. 20423–20443. PMLR, 2022.
- Responsive safety in reinforcement learning by pid lagrangian methods. In International Conference on Machine Learning, pp. 9133–9143. PMLR, 2020.
- Safe exploration for optimization with gaussian processes. In International conference on machine learning, pp. 997–1005. PMLR, 2015.
- Reward constrained policy optimization. arXiv preprint arXiv:1805.11074, 2018.
- Safe reinforcement learning in constrained markov decision processes. In International Conference on Machine Learning, pp. 9797–9806. PMLR, 2020.
- Safe exploration and optimization of constrained mdps using gaussian processes. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
- Enforcing hard constraints with soft barriers: Safe reinforcement learning in unknown stochastic environments. In International Conference on Machine Learning, pp. 36593–36604. PMLR, 2023.
- Trustworthy reinforcement learning against intrinsic vulnerabilities: Robustness, safety, and generalizability. arXiv preprint arXiv:2209.08025, 2022.
- Representation matters: Offline pretraining for sequential decision making. In International Conference on Machine Learning, pp. 11784–11794. PMLR, 2021.
- Projection-based constrained policy optimization. arXiv preprint arXiv:2010.03152, 2020.
- Constraint-conditioned policy optimization for versatile safe reinforcement learning. arXiv preprint arXiv:2310.03718, 2023.
- Mastering atari games with limited data. Advances in Neural Information Processing Systems, 34:25476–25488, 2021.
- Reachability constrained reinforcement learning. In International Conference on Machine Learning, pp. 25636–25655. PMLR, 2022.
- Value-consistent representation learning for data-efficient reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp. 11069–11077, 2023.
- First order constrained optimization in policy space. Advances in Neural Information Processing Systems, 33:15338–15349, 2020.