Constrained Reinforcement Learning with Smoothed Log Barrier Function (2403.14508v1)
Abstract: Reinforcement Learning (RL) has been widely applied to many control tasks and substantially improved the performances compared to conventional control methods in many domains where the reward function is well defined. However, for many real-world problems, it is often more convenient to formulate optimization problems in terms of rewards and constraints simultaneously. Optimizing such constrained problems via reward shaping can be difficult as it requires tedious manual tuning of reward functions with several interacting terms. Recent formulations which include constraints mostly require a pre-training phase, which often needs human expertise to collect data or assumes having a sub-optimal policy readily available. We propose a new constrained RL method called CSAC-LB (Constrained Soft Actor-Critic with Log Barrier Function), which achieves competitive performance without any pre-training by applying a linear smoothed log barrier function to an additional safety critic. It implements an adaptive penalty for policy learning and alleviates the numerical issues that are known to complicate the application of the log barrier function method. As a result, we show that with CSAC-LB, we achieve state-of-the-art performance on several constrained control tasks with different levels of difficulty and evaluate our methods in a locomotion task on a real quadruped robot platform.
- E. Barnard and L. Wessels, “Extrapolation and interpolation in neural network classifiers,” IEEE Control Systems Magazine, vol. 12, no. 5, pp. 50–53, 1992.
- J. Achiam, “Exploration and safety in deep reinforcement learning,” Ph.D. dissertation, EECS Department, University of California, Berkeley, May 2021. [Online]. Available: http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-34.html
- A. Unitree, “Unitree. a1: More dexterity, more posibility, 2018,” https://www.unitree.com/a1/, Jan. 2018.
- E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for model-based control,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vilamoura-Algarve, Portugal: IEEE, Oct. 2012, pp. 5026–5033.
- P. Wu, A. Escontrela, D. Hafner, K. Goldberg, and P. Abbeel, “Daydreamer: World models for physical robot learning,” Proceedings of Machine Learning Research, 2022.
- H. Kervadec, J. Dolz, J. Yuan, C. Desrosiers, E. Granger, and I. B. Ayed, “Constrained deep networks: Lagrangian optimization via log-barrier extensions,” 2019. [Online]. Available: https://arxiv.org/abs/1904.04205
- T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, ser. Proceedings of Machine Learning Research, J. G. Dy and A. Krause, Eds., vol. 80. PMLR, 2018, pp. 1856–1865. [Online]. Available: http://proceedings.mlr.press/v80/haarnoja18b.html
- K. Srinivasan, B. Eysenbach, S. Ha, J. Tan, and C. Finn, “Learning to be safe: Deep RL with a safety critic,” CoRR, vol. abs/2010.14603, 2020. [Online]. Available: https://arxiv.org/abs/2010.14603
- J. García and F. Fernández, “A comprehensive survey on safe reinforcement learning,” J. Mach. Learn. Res., vol. 16, pp. 1437–1480, 2015. [Online]. Available: https://dl.acm.org/doi/10.5555/2789272.2886795
- G. Dulac-Arnold, N. Levine, D. J. Mankowitz, J. Li, C. Paduraru, S. Gowal, and T. Hester, “Challenges of real-world reinforcement learning: definitions, benchmarks and analysis,” Mach. Learn., vol. 110, no. 9, pp. 2419–2468, 2021. [Online]. Available: https://doi.org/10.1007/s10994-021-05961-4
- H. Ma, C. Liu, S. E. Li, S. Zheng, and J. Chen, “Joint synthesis of safety certificate and safe control policy using constrained reinforcement learning,” in Learning for Dynamics and Control Conference, L4DC 2022, 23-24 June 2022, Stanford University, Stanford, CA, USA, ser. Proceedings of Machine Learning Research, R. Firoozi, N. Mehr, E. Yel, R. Antonova, J. Bohg, M. Schwager, and M. J. Kochenderfer, Eds., vol. 168. PMLR, 2022, pp. 97–109. [Online]. Available: https://proceedings.mlr.press/v168/ma22a.html
- E. Uchibe and K. Doya, “Constrained reinforcement learning from intrinsic and extrinsic rewards,” in 2007 IEEE 6th International Conference on Development and Learning, 2007, pp. 163–168.
- H. Bou-Ammar, R. Tutunov, and E. Eaton, “Safe policy search for lifelong reinforcement learning with sublinear regret,” in Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, ser. JMLR Workshop and Conference Proceedings, F. R. Bach and D. M. Blei, Eds., vol. 37. JMLR.org, 2015, pp. 2361–2369. [Online]. Available: http://proceedings.mlr.press/v37/ammar15.html
- J. Achiam, D. Held, A. Tamar, and P. Abbeel, “Constrained policy optimization,” in Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, ser. Proceedings of Machine Learning Research, D. Precup and Y. W. Teh, Eds., vol. 70. PMLR, 2017, pp. 22–31. [Online]. Available: http://proceedings.mlr.press/v70/achiam17a.html
- K. Polymenakos, A. Abate, and S. Roberts, “Safe policy search using gaussian process models,” in Proceedings of the 18th international conference on autonomous agents and multiagent systems, 2019, pp. 1565–1573.
- J. Li, D. Fridovich-Keil, S. Sojoudi, and C. J. Tomlin, “Augmented lagrangian method for instantaneously constrained reinforcement learning problems,” in 2021 60th IEEE Conference on Decision and Control (CDC), Austin, TX, USA, December 14-17, 2021. IEEE, 2021, pp. 2982–2989. [Online]. Available: https://doi.org/10.1109/CDC45484.2021.9683088
- Y. Chow, A. Tamar, S. Mannor, and M. Pavone, “Risk-sensitive and robust decision-making: a cvar optimization approach,” in Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Eds., 2015, pp. 1522–1530. [Online]. Available: https://proceedings.neurips.cc/paper/2015/hash/64223ccf70bbb65a3a4aceac37e21016-Abstract.html
- Y. Chow, M. Ghavamzadeh, L. Janson, and M. Pavone, “Risk-constrained reinforcement learning with percentile risk criteria,” J. Mach. Learn. Res., vol. 18, pp. 167:1–167:51, 2017. [Online]. Available: http://jmlr.org/papers/v18/15-636.html
- S. Ha, P. Xu, Z. Tan, S. Levine, and J. Tan, “Learning to walk in the real world with minimal human effort,” in 4th Conference on Robot Learning, CoRL 2020, 16-18 November 2020, Virtual Event / Cambridge, MA, USA, ser. Proceedings of Machine Learning Research, J. Kober, F. Ramos, and C. J. Tomlin, Eds., vol. 155. PMLR, 2020, pp. 1110–1120. [Online]. Available: https://proceedings.mlr.press/v155/ha21c.html
- Q. Yang, T. D. Simão, S. H. Tindemans, and M. T. J. Spaan, “WCSAC: worst-case soft actor critic for safety-constrained reinforcement learning,” in Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021. AAAI Press, 2021, pp. 10 639–10 646. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/17272
- C. Ying, X. Zhou, H. Su, D. Yan, N. Chen, and J. Zhu, “Towards safe reinforcement learning via constraining conditional value-at-risk,” in Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022, L. D. Raedt, Ed. ijcai.org, 2022, pp. 3673–3680. [Online]. Available: https://doi.org/10.24963/ijcai.2022/510
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” CoRR, vol. abs/1707.06347, 2017. [Online]. Available: http://arxiv.org/abs/1707.06347
- Y. Chow, O. Nachum, E. A. Duéñez-Guzmán, and M. Ghavamzadeh, “A lyapunov-based approach to safe reinforcement learning,” in Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, Eds., 2018, pp. 8103–8112. [Online]. Available: https://proceedings.neurips.cc/paper/2018/hash/4fe5149039b52765bde64beb9f674940-Abstract.html
- Y. Chow, O. Nachum, A. Faust, M. Ghavamzadeh, and E. A. Duéñez-Guzmán, “Lyapunov-based safe policy optimization for continuous control,” CoRR, vol. abs/1901.10031, 2019. [Online]. Available: http://arxiv.org/abs/1901.10031
- T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” in 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2016. [Online]. Available: http://arxiv.org/abs/1509.02971
- F. Berkenkamp, M. Turchetta, A. P. Schoellig, and A. Krause, “Safe model-based reinforcement learning with stability guarantees,” in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett, Eds., 2017, pp. 908–918. [Online]. Available: https://proceedings.neurips.cc/paper/2017/hash/766ebcd59621e305170616ba3d3dac32-Abstract.html
- R. Cheng, G. Orosz, R. M. Murray, and J. W. Burdick, “End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks,” in The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019. AAAI Press, 2019, pp. 3387–3395. [Online]. Available: https://doi.org/10.1609/aaai.v33i01.33013387
- A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada, “Control barrier functions: Theory and applications,” in 2019 18th European Control Conference (ECC), 2019, pp. 3420–3431.
- Y. Luo and T. Ma, “Learning barrier certificates: Towards safe reinforcement learning with zero training-time violations,” in Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, M. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, and J. W. Vaughan, Eds., 2021, pp. 25 621–25 632. [Online]. Available: https://proceedings.neurips.cc/paper/2021/hash/d71fa38b648d86602d14ac610f2e6194-Abstract.html
- M. Pereira, Z. Wang, I. Exarchos, and E. A. Theodorou, “Safe optimal control using stochastic barrier functions and deep forward-backward sdes,” in 4th Conference on Robot Learning, CoRL 2020, 16-18 November 2020, Virtual Event / Cambridge, MA, USA, ser. Proceedings of Machine Learning Research, J. Kober, F. Ramos, and C. J. Tomlin, Eds., vol. 155. PMLR, 2020, pp. 1783–1801. [Online]. Available: https://proceedings.mlr.press/v155/pereira21a.html
- Y. As, I. Usmanova, S. Curi, and A. Krause, “Constrained policy optimization via bayesian world models,” in The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. [Online]. Available: https://openreview.net/forum?id=PRZoSmCinhf
- C. Zeng and H. Zhang, “A logarithmic barrier method for proximal policy optimization,” CoRR, vol. abs/1812.06502, 2018. [Online]. Available: http://arxiv.org/abs/1812.06502
- Y. Liu, J. Ding, and X. Liu, “IPO: interior-point policy optimization under constraints,” in The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. AAAI Press, 2020, pp. 4940–4947. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/5932
- D. P. Bertsekas, “Chapter 4 - exact penalty methods and lagrangian methods,” in Constrained Optimization and Lagrange Multiplier Methods, D. P. Bertsekas, Ed. Academic Press, 1982, pp. 179–301. [Online]. Available: https://www.sciencedirect.com/science/article/pii/B9780120934805500088
- V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proceedings of the 27th International Conference on Machine Learning (ICML-10), June 21-24, 2010, Haifa, Israel, J. Fürnkranz and T. Joachims, Eds. Omnipress, 2010, pp. 807–814. [Online]. Available: https://icml.cc/Conferences/2010/papers/432.pdf
- H. van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning,” in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, USA, D. Schuurmans and M. P. Wellman, Eds. AAAI Press, 2016, pp. 2094–2100. [Online]. Available: http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12389
- Z. Fu, A. Kumar, J. Malik, and D. Pathak, “Minimizing energy consumption leads to the emergence of gaits in legged robots,” in Conference on Robot Learning, 8-11 November 2021, London, UK, ser. Proceedings of Machine Learning Research, A. Faust, D. Hsu, and G. Neumann, Eds., vol. 164. PMLR, 2021, pp. 928–937. [Online]. Available: https://proceedings.mlr.press/v164/fu22a.html
- L. Smith, I. Kostrikov, and S. Levine, “A walk in the park: Learning to walk in 20 minutes with model-free reinforcement learning,” 2022.
- X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to-real transfer of robotic control with dynamics randomization,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), May 2018, pp. 3803–3810.
- D. Hoyt and C. Taylor, “Gait and the energetic of locomotion in horses,” Nature, vol. 292, pp. 239–240, 07 1981.
- Baohe Zhang (6 papers)
- Yuan Zhang (331 papers)
- Lilli Frison (3 papers)
- Thomas Brox (134 papers)
- Joschka Bödecker (3 papers)