Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Distributionally Safe Reinforcement Learning under Model Uncertainty: A Single-Level Approach by Differentiable Convex Programming (2310.02459v1)

Published 3 Oct 2023 in cs.LG, cs.RO, cs.SY, and eess.SY

Abstract: Safety assurance is uncompromisable for safety-critical environments with the presence of drastic model uncertainties (e.g., distributional shift), especially with humans in the loop. However, incorporating uncertainty in safe learning will naturally lead to a bi-level problem, where at the lower level the (worst-case) safety constraint is evaluated within the uncertainty ambiguity set. In this paper, we present a tractable distributionally safe reinforcement learning framework to enforce safety under a distributional shift measured by a Wasserstein metric. To improve the tractability, we first use duality theory to transform the lower-level optimization from infinite-dimensional probability space where distributional shift is measured, to a finite-dimensional parametric space. Moreover, by differentiable convex programming, the bi-level safe learning problem is further reduced to a single-level one with two sequential computationally efficient modules: a convex quadratic program to guarantee safety followed by a projected gradient ascent to simultaneously find the worst-case uncertainty. This end-to-end differentiable framework with safety constraints, to the best of our knowledge, is the first tractable single-level solution to address distributional safety. We test our approach on first and second-order systems with varying complexities and compare our results with the uncertainty-agnostic policies, where our approach demonstrates a significant improvement on safety guarantees.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. S. Li, Y. Wu, X. Cui, H. Dong, F. Fang, and S. Russell, “Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp. 4213–4220.
  2. K. Zhang, Z. Yang, and T. Basar, “Policy optimization provably converges to nash equilibria in zero-sum linear quadratic games,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  3. C. Sun, D.-K. Kim, and J. P. How, “Romax: Certifiably robust deep multiagent reinforcement learning via convex relaxation,” arXiv preprint arXiv:2109.06795, 2021.
  4. B. Yang, L. Zheng, L. J. Ratliff, B. Boots, and J. R. Smith, “Stackelberg maddpg: Learning emergent behaviors via information asymmetry in competitive games,” 2022.
  5. Z. Zhou and H. Xu, “Decentralized adaptive optimal tracking control for massive autonomous vehicle systems with heterogeneous dynamics: A stackelberg game,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 12, pp. 5654–5663, 2021.
  6. N. Lauffer, M. Ghasemi, A. Hashemi, Y. Savas, and U. Topcu, “No-regret learning in dynamic stackelberg games,” arXiv preprint arXiv:2202.04786, 2022.
  7. Y. Bai, C. Jin, H. Wang, and C. Xiong, “Sample-efficient learning of stackelberg equilibria in general-sum games,” Advances in Neural Information Processing Systems, vol. 34, 2021.
  8. W. Jin, S. Mou, and G. Pappas, “Safe pontryagin differentiable programming,” Advances in Neural Information Processing Systems, vol. 34, 2021.
  9. Y. Liu, J. Ding, and X. Liu, “Ipo: Interior-point policy optimization under constraints,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 04, 2020, pp. 4940–4947.
  10. J. Achiam, D. Held, A. Tamar, and P. Abbeel, “Constrained policy optimization,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70.   JMLR. org, 2017, pp. 22–31.
  11. P. Geibel and F. Wysotzki, “Risk-sensitive reinforcement learning applied to control under constraints,” Journal of Artificial Intelligence Research, vol. 24, pp. 81–108, 2005.
  12. Y. Chow, M. Ghavamzadeh, L. Janson, and M. Pavone, “Risk-constrained reinforcement learning with percentile risk criteria,” The Journal of Machine Learning Research, vol. 18, no. 1, pp. 6070–6120, 2017.
  13. Y. Chow, O. Nachum, E. Duenez-Guzman, and M. Ghavamzadeh, “A lyapunov-based approach to safe reinforcement learning,” in Advances in neural information processing systems, 2018, pp. 8092–8101.
  14. A. Stooke, J. Achiam, and P. Abbeel, “Responsive safety in reinforcement learning by pid lagrangian methods,” arXiv preprint arXiv:2007.03964, 2020.
  15. B. Amos and J. Z. Kolter, “Optnet: Differentiable optimization as a layer in neural networks,” arXiv preprint arXiv:1703.00443, 2017.
  16. H. Zhang, H. Chen, C. Xiao, B. Li, M. Liu, D. Boning, and C.-J. Hsieh, “Robust deep reinforcement learning against adversarial perturbations on state observations,” Advances in Neural Information Processing Systems, vol. 33, pp. 21 024–21 037, 2020.
  17. K. Zhang, T. Sun, Y. Tao, S. Genc, S. Mallya, and T. Basar, “Robust multi-agent reinforcement learning with model uncertainty,” Advances in Neural Information Processing Systems, vol. 33, pp. 10 571–10 583, 2020.
  18. L. Brunke, M. Greeff, A. W. Hall, c. Yuan, S. Zhou, J. Panerati, and A. P. Schoellig, “Safe learning in robotics: From learning-based control to safe reinforcement learning,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 5, 2021.
  19. B. Singh, R. Kumar, and V. P. Singh, “Reinforcement learning in robotic applications: a comprehensive survey,” Artificial Intelligence Review, pp. 1–46, 2021.
  20. J. Garcıa and F. Fernández, “A comprehensive survey on safe reinforcement learning,” Journal of Machine Learning Research, vol. 16, no. 1, pp. 1437–1480, 2015.
  21. J. Moos, K. Hansel, H. Abdulsamad, S. Stark, D. Clever, and J. Peters, “Robust reinforcement learning: A review of foundations and recent advances,” Machine Learning and Knowledge Extraction, vol. 4, no. 1, pp. 276–315, 2022.
  22. H. Eghbal-zadeh, F. Henkel, and G. Widmer, “Learning to infer unseen contexts in causal contextual reinforcement learning,” in Self-Supervision for Reinforcement Learning Workshop-ICLR 2021, 2021.
  23. K. Rakelly, A. Zhou, C. Finn, S. Levine, and D. Quillen, “Efficient off-policy meta-reinforcement learning via probabilistic context variables,” in International conference on machine learning.   PMLR, 2019, pp. 5331–5340.
  24. J. Choi, F. Castañeda, C. J. Tomlin, and K. Sreenath, “Reinforcement learning for safety-critical control under model uncertainty, using control lyapunov functions and control barrier functions,” arXiv preprint arXiv:2004.07584, 2020.
  25. L. Zheng, Y. Shi, L. J. Ratliff, and B. Zhang, “Safe reinforcement learning of control-affine systems with vertex networks,” in Learning for Dynamics and Control.   PMLR, 2021, pp. 336–347.
  26. R. Cheng, G. Orosz, R. M. Murray, and J. W. Burdick, “End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 3387–3395.
  27. A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada, “Control barrier function based quadratic programs for safety critical systems,” IEEE Transactions on Automatic Control, vol. 62, no. 8, pp. 3861–3876, 2016.
  28. F. Berkenkamp, M. Turchetta, A. Schoellig, and A. Krause, “Safe model-based reinforcement learning with stability guarantees,” in Advances in neural information processing systems, 2017, pp. 908–918.
  29. C. Sun, D.-K. Kim, and J. P. How, “Fisar: Forward invariant safe reinforcement learning with a deep neural network-based optimizer,” in 2021 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2021, pp. 10 617–10 624.
  30. G. Shi, K. Azizzadenesheli, M. O’Connell, S.-J. Chung, and Y. Yue, “Meta-adaptive nonlinear control: Theory and algorithms,” Advances in Neural Information Processing Systems, vol. 34, pp. 10 013–10 025, 2021.
  31. M. O’Connell, G. Shi, X. Shi, K. Azizzadenesheli, A. Anandkumar, Y. Yue, and S.-J. Chung, “Neural-fly enables rapid learning for agile flight in strong winds,” Science Robotics, vol. 7, no. 66, p. eabm6597, 2022.
  32. B. P. Van Parys, D. Kuhn, P. J. Goulart, and M. Morari, “Distributionally robust control of constrained stochastic systems,” IEEE Transactions on Automatic Control, vol. 61, no. 2, pp. 430–442, 2015.
  33. I. Yang, “Wasserstein distributionally robust stochastic control: A data-driven approach,” IEEE Transactions on Automatic Control, vol. 66, no. 8, pp. 3863–3870, 2020.
  34. A. Hakobyan and I. Yang, “Wasserstein distributionally robust motion control for collision avoidance using conditional value-at-risk,” IEEE Transactions on Robotics, vol. 38, no. 2, pp. 939–957, 2021.
  35. A. Bahari Kordabad, R. Wisniewski, and S. Gros, “Safe reinforcement learning using wasserstein distributionally robust mpc and chance constraint,” 2022.
  36. J. Coulson, J. Lygeros, and F. Dörfler, “Distributionally robust chance constrained data-enabled predictive control,” IEEE Transactions on Automatic Control, vol. 67, no. 7, pp. 3289–3304, 2021.
  37. P. Coppens and P. Patrinos, “Data-driven distributionally robust mpc for constrained stochastic systems,” IEEE Control Systems Letters, vol. 6, pp. 1274–1279, 2021.
  38. A. Z. Ren and A. Majumdar, “Distributionally robust policy learning via adversarial environment generation,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 1379–1386, 2022.
  39. D. Morrison, P. Corke, and J. Leitner, “Egad! an evolved grasping analysis dataset for diversity and reproducibility in robotic manipulation,” IEEE Robotics and Automation Letters, vol. 5, no. 3, pp. 4368–4375, 2020.
  40. D. Wang, D. Tseng, P. Li, Y. Jiang, M. Guo, M. Danielczuk, J. Mahler, J. Ichnowski, and K. Goldberg, “Adversarial grasp objects,” in 2019 IEEE 15th International Conference on Automation Science and Engineering (CASE).   IEEE, 2019, pp. 241–248.
  41. M. Xu, P. Huang, Y. Niu, V. Kumar, J. Qiu, C. Fang, K.-H. Lee, X. Qi, H. Lam, B. Li, et al., “Group distributionally robust reinforcement learning with hierarchical latent variables,” arXiv preprint arXiv:2210.12262, 2022.
  42. N. Kallus, X. Mao, K. Wang, and Z. Zhou, “Doubly robust distributionally robust off-policy evaluation and learning,” in International Conference on Machine Learning.   PMLR, 2022, pp. 10 598–10 632.
  43. L. Shi and Y. Chi, “Distributionally robust model-based offline reinforcement learning with near-optimal sample complexity,” arXiv preprint arXiv:2208.05767, 2022.
  44. A. Sinha, H. Namkoong, and J. Duchi, “Certifiable distributional robustness with principled adversarial training,” arXiv preprint arXiv:1710.10571, vol. 2, 2017.
  45. J. Blanchet and K. Murthy, “Quantifying distributional model risk via optimal transport,” Mathematics of Operations Research, vol. 44, no. 2, pp. 565–600, 2019.
  46. H. Rahimian and S. Mehrotra, “Distributionally robust optimization: A review,” arXiv preprint arXiv:1908.05659, 2019.
  47. W. Xiao and C. Belta, “High-order control barrier functions,” IEEE Transactions on Automatic Control, vol. 67, no. 7, pp. 3655–3662, 2021.
  48. A. Agrawal, B. Amos, S. Barratt, S. Boyd, S. Diamond, and J. Z. Kolter, “Differentiable convex optimization layers,” Advances in neural information processing systems, vol. 32, 2019.
  49. A. Agrawal, S. Barratt, S. Boyd, E. Busseti, and W. M. Moursi, “Differentiating through a cone program,” arXiv preprint arXiv:1904.09043, 2019.
  50. A. E. Chriat and C. Sun, “On the optimality, stability, and feasibility of control barrier functions: An adaptive learning-based approach,” arXiv preprint arXiv:2305.03608, 2023.
  51. E. Daş, S. X. Wei, and J. W. Burdick, “Robust control barrier functions with uncertainty estimation,” arXiv preprint arXiv:2304.08538, 2023.
  52. A. E. Chriat and C. Sun, “Wasserstein distributionally robust control barrier function using conditional value-at-risk with differentiable convex programming,” Accepted to AIAA SciTech 2024, 2024.
  53. C. R. Givens and R. M. Shortt, “A class of wasserstein metrics for probability distributions.” Michigan Mathematical Journal, vol. 31, no. 2, pp. 231–240, 1984.
  54. J. M. Joyce, “Kullback-leibler divergence,” in International encyclopedia of statistical science.   Springer, 2011, pp. 720–722.
  55. R. T. Rockafellar, S. Uryasev, et al., “Optimization of conditional value-at-risk,” Journal of risk, vol. 2, pp. 21–42, 2000.
  56. Y. Wang, G. Zhang, and J. Ba, “On solving minimax optimization locally: A follow-the-ridge approach,” arXiv preprint arXiv:1910.07512, 2019.
  57. K. K. Thekumparampil, P. Jain, P. Netrapalli, and S. Oh, “Efficient algorithms for smooth minimax optimization,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  58. Z. Wang, X. Wang, L. Shen, Q. Suo, K. Song, D. Yu, Y. Shen, and M. Gao, “Meta-learning without data via wasserstein distributionally-robust model fusion,” in Uncertainty in Artificial Intelligence.   PMLR, 2022, pp. 2045–2055.
  59. T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
Citations (1)

Summary

We haven't generated a summary for this paper yet.