Papers
Topics
Authors
Recent
Search
2000 character limit reached

Safe Exploration in Reinforcement Learning: A Generalized Formulation and Algorithms

Published 5 Oct 2023 in cs.LG, cs.AI, and cs.RO | (2310.03225v1)

Abstract: Safe exploration is essential for the practical use of reinforcement learning (RL) in many real-world scenarios. In this paper, we present a generalized safe exploration (GSE) problem as a unified formulation of common safe exploration problems. We then propose a solution of the GSE problem in the form of a meta-algorithm for safe exploration, MASE, which combines an unconstrained RL algorithm with an uncertainty quantifier to guarantee safety in the current episode while properly penalizing unsafe explorations before actual safety violation to discourage them in future episodes. The advantage of MASE is that we can optimize a policy while guaranteeing with a high probability that no safety constraint will be violated under proper assumptions. Specifically, we present two variants of MASE with different constructions of the uncertainty quantifier: one based on generalized linear models with theoretical guarantees of safety and near-optimality, and another that combines a Gaussian process to ensure safety with a deep RL algorithm to maximize the reward. Finally, we demonstrate that our proposed algorithm achieves better performance than state-of-the-art algorithms on grid-world and Safety Gym benchmarks without violating any safety constraints, even during training.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Constrained policy optimization. In International Conference on Machine Learning (ICML).
  2. Safe reinforcement learning via shielding. In AAAI Conference on Artificial Intelligence (AAAI).
  3. Altman, E. (1999). Constrained Markov decision processes, volume 7. CRC Press.
  4. Linear stochastic bandits under safety constraints. In Neural Information Processing Systems (NeurIPS).
  5. Safe reinforcement learning with linear function approximation. In International Conference on Machine Learning (ICML).
  6. Concrete problems in AI safety. arXiv preprint arXiv:1606.06565.
  7. Logarithmic online regret bounds for undiscounted reinforcement learning. In Neural Information Processing Systems (NeurIPS).
  8. Autonomy for mars rovers: Past, present, and future. Computer, 41(12):44–50.
  9. Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 5:411–444.
  10. Safe reinforcement learning via shielding under partial observability. In AAAI Conference on Artificial Intelligence.
  11. Risk-constrained reinforcement learning with percentile risk criteria. Journal of Machine Learning Research (JMLR), 18(1):6070–6120.
  12. On kernelized multi-armed bandits. In International Conference on Machine Learning (ICML).
  13. Natural policy gradient primal-dual method for constrained Markov decision processes. In Neural Information Processing Systems (NeurIPS).
  14. Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Machine Learning.
  15. Leave no trace: Learning to reset for safe and autonomous reinforcement learning. In International Conference on Learning Representations (ICLR).
  16. Parametric bandits: The generalized linear case. In Neural Information Processing Systems (NeurIPS).
  17. Addressing function approximation error in actor-critic methods. In International Conference on Machine Learning (ICML).
  18. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research (JMLR), 16(1):1437–1480.
  19. Provably efficient reinforcement learning with linear function approximation. In Conference on Learning Theory (COLT).
  20. Safe reinforcement learning using probabilistic shields. In International Conference on Concurrency Theory: CONCUR.
  21. A contextual-bandit approach to personalized news article recommendation. In International Conference on World Wide Web (WWW).
  22. Provably optimal algorithms for generalized linear contextual bandits. In International Conference on Machine Learning (ICML), pages 2071–2080.
  23. Shield decentralization for safe multi-agent reinforcement learning. In Neural Information Processing Systems (NeurIPS).
  24. Chance-constrained dynamic programming with application to risk-aware robotic space exploration. Autonomous Robots, 39(4):555–571.
  25. Constrained reinforcement learning has zero duality gap. Neural Information Processing Systems (NeurIPS).
  26. Safe reinforcement learning with chance-constrained model predictive control. In Learning for Dynamics and Control Conference (L4DC), pages 291–303.
  27. Rasmussen, C. E. (2003). Gaussian processes in machine learning. In Summer school on machine learning, pages 63–71. Springer.
  28. Benchmarking safe exploration in deep reinforcement learning. OpenAI.
  29. Optimization of conditional value-at-risk. Journal of risk, 2:21–42.
  30. Doubly stochastic variational inference for deep Gaussian processes. In Neural Information Processing Systems (NeurIPS).
  31. Trust region policy optimization. In International Conference on Machine Learning (ICML).
  32. Sauté RL: Almost surely safe reinforcement learning using state augmentation. In International Conference on Machine Learning (ICML).
  33. Responsive safety in reinforcement learning by pid Lagrangian methods. In International Conference on Machine Learning (ICML), pages 9133–9143.
  34. A theoretical analysis of model-based interval estimation. In International Conference on Machine Learning (ICML).
  35. An analysis of model-based interval estimation for Markov decision processes. Journal of Computer and System Sciences, 74(8):1309–1331.
  36. Safe exploration for optimization with Gaussian processes. In International Conference on Machine Learning (ICML).
  37. Safe exploration by solving early terminated MDP. arXiv preprint arXiv:2107.04200.
  38. Recovery RL: Safe reinforcement learning with learned recovery zones. IEEE Robotics and Automation Letters, 6(3):4915–4922.
  39. Safe reinforcement learning by imagining the near future. In Neural Information Processing Systems (NeurIPS).
  40. Safe reinforcement learning via curriculum induction. In Neural Information Processing Systems (NeurIPS), volume 33, pages 12151–12162.
  41. Safe reinforcement learning in constrained Markov decision processes. In International Conference on Machine Learning (ICML).
  42. Safe exploration and optimization of constrained MDPs using Gaussian processes. In AAAI Conference on Artificial Intelligence (AAAI).
  43. Safe policy optimization with local generalized linear function approximations. In Neural Information Processing Systems (NeurIPS).
  44. Optimism in reinforcement learning with generalized linear function approximation. In International Conference on Learning Representations (ICLR).
  45. Enforcing hard constraints with soft barriers: Safe reinforcement learning in unknown stochastic environments. In International Conference on Machine Learning, pages 36593–36604. PMLR.
  46. Sample-optimal parametric Q-learning using linearly additive features. In International Conference on Machine Learning (ICML).
Citations (7)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.