Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Constrained Reinforcement Learning with Smoothed Log Barrier Function (2403.14508v1)

Published 21 Mar 2024 in cs.LG, cs.AI, cs.SY, and eess.SY

Abstract: Reinforcement Learning (RL) has been widely applied to many control tasks and substantially improved the performances compared to conventional control methods in many domains where the reward function is well defined. However, for many real-world problems, it is often more convenient to formulate optimization problems in terms of rewards and constraints simultaneously. Optimizing such constrained problems via reward shaping can be difficult as it requires tedious manual tuning of reward functions with several interacting terms. Recent formulations which include constraints mostly require a pre-training phase, which often needs human expertise to collect data or assumes having a sub-optimal policy readily available. We propose a new constrained RL method called CSAC-LB (Constrained Soft Actor-Critic with Log Barrier Function), which achieves competitive performance without any pre-training by applying a linear smoothed log barrier function to an additional safety critic. It implements an adaptive penalty for policy learning and alleviates the numerical issues that are known to complicate the application of the log barrier function method. As a result, we show that with CSAC-LB, we achieve state-of-the-art performance on several constrained control tasks with different levels of difficulty and evaluate our methods in a locomotion task on a real quadruped robot platform.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. E. Barnard and L. Wessels, “Extrapolation and interpolation in neural network classifiers,” IEEE Control Systems Magazine, vol. 12, no. 5, pp. 50–53, 1992.
  2. J. Achiam, “Exploration and safety in deep reinforcement learning,” Ph.D. dissertation, EECS Department, University of California, Berkeley, May 2021. [Online]. Available: http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-34.html
  3. A. Unitree, “Unitree. a1: More dexterity, more posibility, 2018,” https://www.unitree.com/a1/, Jan. 2018.
  4. E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for model-based control,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.   Vilamoura-Algarve, Portugal: IEEE, Oct. 2012, pp. 5026–5033.
  5. P. Wu, A. Escontrela, D. Hafner, K. Goldberg, and P. Abbeel, “Daydreamer: World models for physical robot learning,” Proceedings of Machine Learning Research, 2022.
  6. H. Kervadec, J. Dolz, J. Yuan, C. Desrosiers, E. Granger, and I. B. Ayed, “Constrained deep networks: Lagrangian optimization via log-barrier extensions,” 2019. [Online]. Available: https://arxiv.org/abs/1904.04205
  7. T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, ser. Proceedings of Machine Learning Research, J. G. Dy and A. Krause, Eds., vol. 80.   PMLR, 2018, pp. 1856–1865. [Online]. Available: http://proceedings.mlr.press/v80/haarnoja18b.html
  8. K. Srinivasan, B. Eysenbach, S. Ha, J. Tan, and C. Finn, “Learning to be safe: Deep RL with a safety critic,” CoRR, vol. abs/2010.14603, 2020. [Online]. Available: https://arxiv.org/abs/2010.14603
  9. J. García and F. Fernández, “A comprehensive survey on safe reinforcement learning,” J. Mach. Learn. Res., vol. 16, pp. 1437–1480, 2015. [Online]. Available: https://dl.acm.org/doi/10.5555/2789272.2886795
  10. G. Dulac-Arnold, N. Levine, D. J. Mankowitz, J. Li, C. Paduraru, S. Gowal, and T. Hester, “Challenges of real-world reinforcement learning: definitions, benchmarks and analysis,” Mach. Learn., vol. 110, no. 9, pp. 2419–2468, 2021. [Online]. Available: https://doi.org/10.1007/s10994-021-05961-4
  11. H. Ma, C. Liu, S. E. Li, S. Zheng, and J. Chen, “Joint synthesis of safety certificate and safe control policy using constrained reinforcement learning,” in Learning for Dynamics and Control Conference, L4DC 2022, 23-24 June 2022, Stanford University, Stanford, CA, USA, ser. Proceedings of Machine Learning Research, R. Firoozi, N. Mehr, E. Yel, R. Antonova, J. Bohg, M. Schwager, and M. J. Kochenderfer, Eds., vol. 168.   PMLR, 2022, pp. 97–109. [Online]. Available: https://proceedings.mlr.press/v168/ma22a.html
  12. E. Uchibe and K. Doya, “Constrained reinforcement learning from intrinsic and extrinsic rewards,” in 2007 IEEE 6th International Conference on Development and Learning, 2007, pp. 163–168.
  13. H. Bou-Ammar, R. Tutunov, and E. Eaton, “Safe policy search for lifelong reinforcement learning with sublinear regret,” in Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, ser. JMLR Workshop and Conference Proceedings, F. R. Bach and D. M. Blei, Eds., vol. 37.   JMLR.org, 2015, pp. 2361–2369. [Online]. Available: http://proceedings.mlr.press/v37/ammar15.html
  14. J. Achiam, D. Held, A. Tamar, and P. Abbeel, “Constrained policy optimization,” in Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, ser. Proceedings of Machine Learning Research, D. Precup and Y. W. Teh, Eds., vol. 70.   PMLR, 2017, pp. 22–31. [Online]. Available: http://proceedings.mlr.press/v70/achiam17a.html
  15. K. Polymenakos, A. Abate, and S. Roberts, “Safe policy search using gaussian process models,” in Proceedings of the 18th international conference on autonomous agents and multiagent systems, 2019, pp. 1565–1573.
  16. J. Li, D. Fridovich-Keil, S. Sojoudi, and C. J. Tomlin, “Augmented lagrangian method for instantaneously constrained reinforcement learning problems,” in 2021 60th IEEE Conference on Decision and Control (CDC), Austin, TX, USA, December 14-17, 2021.   IEEE, 2021, pp. 2982–2989. [Online]. Available: https://doi.org/10.1109/CDC45484.2021.9683088
  17. Y. Chow, A. Tamar, S. Mannor, and M. Pavone, “Risk-sensitive and robust decision-making: a cvar optimization approach,” in Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Eds., 2015, pp. 1522–1530. [Online]. Available: https://proceedings.neurips.cc/paper/2015/hash/64223ccf70bbb65a3a4aceac37e21016-Abstract.html
  18. Y. Chow, M. Ghavamzadeh, L. Janson, and M. Pavone, “Risk-constrained reinforcement learning with percentile risk criteria,” J. Mach. Learn. Res., vol. 18, pp. 167:1–167:51, 2017. [Online]. Available: http://jmlr.org/papers/v18/15-636.html
  19. S. Ha, P. Xu, Z. Tan, S. Levine, and J. Tan, “Learning to walk in the real world with minimal human effort,” in 4th Conference on Robot Learning, CoRL 2020, 16-18 November 2020, Virtual Event / Cambridge, MA, USA, ser. Proceedings of Machine Learning Research, J. Kober, F. Ramos, and C. J. Tomlin, Eds., vol. 155.   PMLR, 2020, pp. 1110–1120. [Online]. Available: https://proceedings.mlr.press/v155/ha21c.html
  20. Q. Yang, T. D. Simão, S. H. Tindemans, and M. T. J. Spaan, “WCSAC: worst-case soft actor critic for safety-constrained reinforcement learning,” in Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021.   AAAI Press, 2021, pp. 10 639–10 646. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/17272
  21. C. Ying, X. Zhou, H. Su, D. Yan, N. Chen, and J. Zhu, “Towards safe reinforcement learning via constraining conditional value-at-risk,” in Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022, L. D. Raedt, Ed.   ijcai.org, 2022, pp. 3673–3680. [Online]. Available: https://doi.org/10.24963/ijcai.2022/510
  22. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” CoRR, vol. abs/1707.06347, 2017. [Online]. Available: http://arxiv.org/abs/1707.06347
  23. Y. Chow, O. Nachum, E. A. Duéñez-Guzmán, and M. Ghavamzadeh, “A lyapunov-based approach to safe reinforcement learning,” in Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, Eds., 2018, pp. 8103–8112. [Online]. Available: https://proceedings.neurips.cc/paper/2018/hash/4fe5149039b52765bde64beb9f674940-Abstract.html
  24. Y. Chow, O. Nachum, A. Faust, M. Ghavamzadeh, and E. A. Duéñez-Guzmán, “Lyapunov-based safe policy optimization for continuous control,” CoRR, vol. abs/1901.10031, 2019. [Online]. Available: http://arxiv.org/abs/1901.10031
  25. T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” in 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2016. [Online]. Available: http://arxiv.org/abs/1509.02971
  26. F. Berkenkamp, M. Turchetta, A. P. Schoellig, and A. Krause, “Safe model-based reinforcement learning with stability guarantees,” in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett, Eds., 2017, pp. 908–918. [Online]. Available: https://proceedings.neurips.cc/paper/2017/hash/766ebcd59621e305170616ba3d3dac32-Abstract.html
  27. R. Cheng, G. Orosz, R. M. Murray, and J. W. Burdick, “End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks,” in The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019.   AAAI Press, 2019, pp. 3387–3395. [Online]. Available: https://doi.org/10.1609/aaai.v33i01.33013387
  28. A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada, “Control barrier functions: Theory and applications,” in 2019 18th European Control Conference (ECC), 2019, pp. 3420–3431.
  29. Y. Luo and T. Ma, “Learning barrier certificates: Towards safe reinforcement learning with zero training-time violations,” in Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, M. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, and J. W. Vaughan, Eds., 2021, pp. 25 621–25 632. [Online]. Available: https://proceedings.neurips.cc/paper/2021/hash/d71fa38b648d86602d14ac610f2e6194-Abstract.html
  30. M. Pereira, Z. Wang, I. Exarchos, and E. A. Theodorou, “Safe optimal control using stochastic barrier functions and deep forward-backward sdes,” in 4th Conference on Robot Learning, CoRL 2020, 16-18 November 2020, Virtual Event / Cambridge, MA, USA, ser. Proceedings of Machine Learning Research, J. Kober, F. Ramos, and C. J. Tomlin, Eds., vol. 155.   PMLR, 2020, pp. 1783–1801. [Online]. Available: https://proceedings.mlr.press/v155/pereira21a.html
  31. Y. As, I. Usmanova, S. Curi, and A. Krause, “Constrained policy optimization via bayesian world models,” in The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022.   OpenReview.net, 2022. [Online]. Available: https://openreview.net/forum?id=PRZoSmCinhf
  32. C. Zeng and H. Zhang, “A logarithmic barrier method for proximal policy optimization,” CoRR, vol. abs/1812.06502, 2018. [Online]. Available: http://arxiv.org/abs/1812.06502
  33. Y. Liu, J. Ding, and X. Liu, “IPO: interior-point policy optimization under constraints,” in The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020.   AAAI Press, 2020, pp. 4940–4947. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/5932
  34. D. P. Bertsekas, “Chapter 4 - exact penalty methods and lagrangian methods,” in Constrained Optimization and Lagrange Multiplier Methods, D. P. Bertsekas, Ed.   Academic Press, 1982, pp. 179–301. [Online]. Available: https://www.sciencedirect.com/science/article/pii/B9780120934805500088
  35. V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proceedings of the 27th International Conference on Machine Learning (ICML-10), June 21-24, 2010, Haifa, Israel, J. Fürnkranz and T. Joachims, Eds.   Omnipress, 2010, pp. 807–814. [Online]. Available: https://icml.cc/Conferences/2010/papers/432.pdf
  36. H. van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning,” in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, USA, D. Schuurmans and M. P. Wellman, Eds.   AAAI Press, 2016, pp. 2094–2100. [Online]. Available: http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12389
  37. Z. Fu, A. Kumar, J. Malik, and D. Pathak, “Minimizing energy consumption leads to the emergence of gaits in legged robots,” in Conference on Robot Learning, 8-11 November 2021, London, UK, ser. Proceedings of Machine Learning Research, A. Faust, D. Hsu, and G. Neumann, Eds., vol. 164.   PMLR, 2021, pp. 928–937. [Online]. Available: https://proceedings.mlr.press/v164/fu22a.html
  38. L. Smith, I. Kostrikov, and S. Levine, “A walk in the park: Learning to walk in 20 minutes with model-free reinforcement learning,” 2022.
  39. X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to-real transfer of robotic control with dynamics randomization,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), May 2018, pp. 3803–3810.
  40. D. Hoyt and C. Taylor, “Gait and the energetic of locomotion in horses,” Nature, vol. 292, pp. 239–240, 07 1981.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Baohe Zhang (6 papers)
  2. Yuan Zhang (331 papers)
  3. Lilli Frison (3 papers)
  4. Thomas Brox (134 papers)
  5. Joschka Bödecker (3 papers)
Citations (6)
X Twitter Logo Streamline Icon: https://streamlinehq.com