Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Long and Short-Term Constraints Driven Safe Reinforcement Learning for Autonomous Driving (2403.18209v2)

Published 27 Mar 2024 in cs.LG, cs.AI, and cs.RO

Abstract: Reinforcement learning (RL) has been widely used in decision-making and control tasks, but the risk is very high for the agent in the training process due to the requirements of interaction with the environment, which seriously limits its industrial applications such as autonomous driving systems. Safe RL methods are developed to handle this issue by constraining the expected safety violation costs as a training objective, but the occurring probability of an unsafe state is still high, which is unacceptable in autonomous driving tasks. Moreover, these methods are difficult to achieve a balance between the cost and return expectations, which leads to learning performance degradation for the algorithms. In this paper, we propose a novel algorithm based on the long and short-term constraints (LSTC) for safe RL. The short-term constraint aims to enhance the short-term state safety that the vehicle explores, while the long-term constraint enhances the overall safety of the vehicle throughout the decision-making process, both of which are jointly used to enhance the vehicle safety in the training process. In addition, we develop a safe RL method with dual-constraint optimization based on the Lagrange multiplier to optimize the training process for end-to-end autonomous driving. Comprehensive experiments were conducted on the MetaDrive simulator. Experimental results demonstrate that the proposed method achieves higher safety in continuous state and action tasks, and exhibits higher exploration performance in long-distance decision-making tasks compared with state-of-the-art methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. L. Chen, Y. Li, W. Silamu, Q. Li, S. Ge, and F.-Y. Wang, “Smart mining with autonomous driving in industry 5.0: Architectures, platforms, operating systems, foundation models, and applications,” IEEE Transactions on Intelligent Vehicles, pp. 1–11, 2024.
  2. L. Chen, Y. Xie, Y. Wang, S. Ge, and F.-Y. Wang, “Sustainable mining in the era of artificial intelligence,” IEEE/CAA Journal of Automatica Sinica, vol. 11, no. 1, pp. 1–4, 2024.
  3. X. Hu, S. Li, T. Huang, B. Tang, R. Huai, and L. Chen, “How simulation helps autonomous driving: A survey of sim2real, digital twins, and parallel intelligence,” IEEE Transactions on Intelligent Vehicles, vol. 9, no. 1, pp. 593–612, 2024.
  4. S. Teng, X. Hu, P. Deng, B. Li, Y. Li, Y. Ai, D. Yang, L. Li, Z. Xuanyuan, F. Zhu et al., “Motion planning for autonomous driving: The state of the art and future perspectives,” IEEE Transactions on Intelligent Vehicles, 2023.
  5. X. Hu, B. Tang, L. Chen, S. Song, and X. Tong, “Learning a deep cascaded neural network for multiple motion commands prediction in autonomous driving,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 12, pp. 7585–7596, 2020.
  6. X. Hu, Y. Liu, B. Tang, J. Yan, and L. Chen, “Learning dynamic graph for overtaking strategy in autonomous driving,” IEEE Transactions on Intelligent Transportation Systems, 2023.
  7. X. Hu, L. Chen, B. Tang, D. Cao, and H. He, “Dynamic path planning for autonomous driving on various roads with avoidance of static and moving obstacles,” Mechanical Systems and Signal Processing, vol. 100, pp. 482–500, 2018.
  8. B. R. Kiran, I. Sobh, V. Talpaert, P. Mannion, A. A. Al Sallab, S. Yogamani, and P. Pérez, “Deep reinforcement learning for autonomous driving: A survey,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 6, pp. 4909–4926, 2021.
  9. L. Chen, X. Hu, B. Tang, and Y. Cheng, “Conditional dqn-based motion planning with fuzzy logic for autonomous driving,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 4, pp. 2966–2977, 2020.
  10. Y. Chow, M. Ghavamzadeh, L. Janson, and M. Pavone, “Risk-constrained reinforcement learning with percentile risk criteria,” The Journal of Machine Learning Research, vol. 18, no. 1, pp. 6070–6120, 2017.
  11. J. Achiam, D. Held, A. Tamar, and P. Abbeel, “Constrained policy optimization,” in International conference on machine learning.   PMLR, 2017, pp. 22–31.
  12. C. Tessler, D. J. Mankowitz, and S. Mannor, “Reward constrained policy optimization,” arXiv preprint arXiv:1805.11074, 2018.
  13. T.-Y. Yang, J. Rosca, K. Narasimhan, and P. J. Ramadge, “Projection-based constrained policy optimization,” arXiv preprint arXiv:2010.03152, 2020.
  14. Y. Zhang, Q. Vuong, and K. Ross, “First order constrained optimization in policy space,” Advances in Neural Information Processing Systems, vol. 33, pp. 15 338–15 349, 2020.
  15. A. Stooke, J. Achiam, and P. Abbeel, “Responsive safety in reinforcement learning by pid lagrangian methods,” in International Conference on Machine Learning.   PMLR, 2020, pp. 9133–9143.
  16. L. Chen, Y. Zhang, B. Tian, Y. Ai, D. Cao, and F.-Y. Wang, “Parallel driving os: A ubiquitous operating system for autonomous driving in cpss,” IEEE Transactions on Intelligent Vehicles, vol. 7, no. 4, pp. 886–895, 2022.
  17. L. Chen, Y. Li, C. Huang, B. Li, Y. Xing, D. Tian, L. Li, Z. Hu, X. Na, Z. Li, C. Lv, J. Wang, D. Cao, N. Zheng, and F.-Y. Wang, “Milestones in autonomous driving and intelligent vehicles: Survey of surveys,” IEEE Transactions on Intelligent Vehicles, vol. 8, no. 2, pp. 1046–1056, 2023.
  18. W. Tian, M. Lauer, and L. Chen, “Online multi-object tracking using joint domain information in traffic scenarios,” IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 1, pp. 374–384, 2020.
  19. X. Yuwen, L. Chen, F. Yan, H. Zhang, J. Tang, B. Tian, and Y. Ai, “Improved vehicle lidar calibration with trajectory-based hand-eye method,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 1, pp. 215–224, 2022.
  20. L. Chen, L. Fan, G. Xie, K. Huang, and A. Nüchter, “Moving-object detection from consecutive stereo pairs using slanted plane smoothing,” IEEE Transactions on Intelligent Transportation Systems, vol. 18, no. 11, pp. 3093–3102, 2017.
  21. L. Chen, Q. Zou, Z. Pan, D. Lai, L. Zhu, Z. Hou, J. Wang, and D. Cao, “Surrounding vehicle detection using an fpga panoramic camera and deep cnns,” IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 12, pp. 5110–5122, 2020.
  22. L. Chen, X. Hu, T. Xu, H. Kuang, and Q. Li, “Turn signal detection during nighttime by cnn detector and perceptual hashing tracking,” IEEE Transactions on Intelligent Transportation Systems, vol. 18, no. 12, pp. 3303–3314, 2017.
  23. Z. Zhu and H. Zhao, “A survey of deep rl and il for autonomous driving policy learning,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 9, pp. 14 043–14 065, 2021.
  24. M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang et al., “End to end learning for self-driving cars,” arXiv preprint arXiv:1604.07316, 2016.
  25. F. Codevilla, M. Müller, A. López, V. Koltun, and A. Dosovitskiy, “End-to-end driving via conditional imitation learning,” in 2018 IEEE international conference on robotics and automation (ICRA).   IEEE, 2018, pp. 4693–4700.
  26. Q. Wang, L. Chen, B. Tian, W. Tian, L. Li, and D. Cao, “End-to-end autonomous driving: An angle branched network approach,” IEEE Transactions on Vehicular Technology, vol. 68, no. 12, pp. 11 599–11 610, 2019.
  27. M. Peng, Z. Gong, C. Sun, L. Chen, and D. Cao, “Imitative reinforcement learning fusing vision and pure pursuit for self-driving,” in 2020 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2020, pp. 3298–3304.
  28. S. Teng, L. Chen, Y. Ai, Y. Zhou, Z. Xuanyuan, and X. Hu, “Hierarchical interpretable imitation learning for end-to-end autonomous driving,” IEEE Transactions on Intelligent Vehicles, vol. 8, no. 1, pp. 673–683, 2022.
  29. J. Zhang and K. Cho, “Query-efficient imitation learning for end-to-end autonomous driving,” arXiv preprint arXiv:1605.06450, 2016.
  30. J. Chen, B. Yuan, and M. Tomizuka, “Deep imitation learning for autonomous driving in generic urban scenarios with enhanced safety,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2019, pp. 2884–2890.
  31. A. Alizadeh, M. Moghadam, Y. Bicer, N. K. Ure, U. Yavas, and C. Kurtulus, “Automated lane change decision making using deep reinforcement learning in dynamic and uncertain highway environment,” in 2019 IEEE intelligent transportation systems conference (ITSC).   IEEE, 2019, pp. 1399–1404.
  32. T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
  33. D. M. Saxena, S. Bae, A. Nakhaei, K. Fujimura, and M. Likhachev, “Driving in dense traffic with model-free reinforcement learning,” in 2020 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2020, pp. 5385–5392.
  34. F. Ye, X. Cheng, P. Wang, C.-Y. Chan, and J. Zhang, “Automated lane change strategy using proximal policy optimization-based deep reinforcement learning,” in 2020 IEEE Intelligent Vehicles Symposium (IV).   IEEE, 2020, pp. 1746–1752.
  35. Y. Guan, Y. Ren, S. E. Li, Q. Sun, L. Luo, and K. Li, “Centralized cooperation for connected and automated vehicles at intersections by proximal policy optimization,” IEEE Transactions on Vehicular Technology, vol. 69, no. 11, pp. 12 597–12 608, 2020.
  36. Y. Wu, S. Liao, X. Liu, Z. Li, and R. Lu, “Deep reinforcement learning on autonomous driving policy with auxiliary critic network,” IEEE transactions on neural networks and learning systems, 2021.
  37. M. Bouton, J. Karlsson, A. Nakhaei, K. Fujimura, M. J. Kochenderfer, and J. Tumova, “Reinforcement learning with probabilistic guarantees for autonomous driving,” arXiv preprint arXiv:1904.07189, 2019.
  38. A. Ray, J. Achiam, and D. Amodei, “Benchmarking safe exploration in deep reinforcement learning,” arXiv preprint arXiv:1910.01708, vol. 7, no. 1, p. 2, 2019.
  39. J. Roy, R. Girgis, J. Romoff, P.-L. Bacon, and C. Pal, “Direct behavior specification via constrained reinforcement learning,” arXiv preprint arXiv:2112.12228, 2021.
  40. L. Zhang, L. Shen, L. Yang, S. Chen, B. Yuan, X. Wang, and D. Tao, “Penalized proximal policy optimization for safe reinforcement learning,” arXiv preprint arXiv:2205.11814, 2022.
  41. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
  42. S. Paternain, M. Calvo-Fullana, L. F. Chamon, and A. Ribeiro, “Safe policies for reinforcement learning via primal-dual methods,” IEEE Transactions on Automatic Control, vol. 68, no. 3, pp. 1321–1336, 2022.
  43. J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High-dimensional continuous control using generalized advantage estimation,” arXiv preprint arXiv:1506.02438, 2015.
  44. Q. Li, Z. Peng, L. Feng, Q. Zhang, Z. Xue, and B. Zhou, “Metadrive: Composing diverse driving scenarios for generalizable reinforcement learning,” IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 3, pp. 3461–3475, 2022.
  45. Y. Yang, Y. Jiang, Y. Liu, J. Chen, and S. E. Li, “Model-free safe reinforcement learning through neural barrier certificate,” IEEE Robotics and Automation Letters, vol. 8, no. 3, pp. 1295–1302, 2023.
  46. L. Zhang, Q. Zhang, L. Shen, B. Yuan, and X. Wang, “Saferl-kit: Evaluating efficient reinforcement learning methods for safe autonomous driving,” arXiv preprint arXiv:2206.08528, 2022.
  47. G. Dalal, K. Dvijotham, M. Vecerik, T. Hester, C. Paduraru, and Y. Tassa, “Safe exploration in continuous action spaces,” arXiv preprint arXiv:1801.08757, 2018.
  48. B. Thananjeyan, A. Balakrishna, S. Nair, M. Luo, K. Srinivasan, M. Hwang, J. E. Gonzalez, J. Ibarz, C. Finn, and K. Goldberg, “Recovery rl: Safe reinforcement learning with learned recovery zones,” IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 4915–4922, 2021.
  49. S. Ha, P. Xu, Z. Tan, S. Levine, and J. Tan, “Learning to walk in the real world with minimal human effort,” arXiv preprint arXiv:2002.08550, 2020.
  50. H. Ma, Y. Guan, S. E. Li, X. Zhang, S. Zheng, and J. Chen, “Feasible actor-critic: Constrained reinforcement learning for ensuring statewise safety,” arXiv preprint arXiv:2105.10682, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Xuemin Hu (5 papers)
  2. Pan Chen (22 papers)
  3. Yijun Wen (1 paper)
  4. Bo Tang (111 papers)
  5. Long Chen (395 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com