How to ensure a safe control strategy? Towards a SRL for urban transit autonomous operation (2311.14457v2)
Abstract: Deep reinforcement learning has gradually shown its latent decision-making ability in urban rail transit autonomous operation. However, since reinforcement learning can not neither guarantee safety during learning nor execution, this is still one of the major obstacles to the practical application of reinforcement learning. Given this drawback, reinforcement learning applied in the safety-critical autonomous operation domain remains challenging without generating a safe control command sequence that avoids overspeed operations. Therefore, a SSA-DRL framework is proposed in this paper for safe intelligent control of urban rail transit autonomous operation trains. The proposed framework is combined with linear temporal logic, reinforcement learning and Monte Carlo tree search and consists of four mainly module: a post-posed shielding, a searching tree module, a DRL framework and an additional actor. Furthermore, the output of the framework can meet speed constraint, schedule constraint and optimize the operation process. Finally, the proposed SSA-DRL framework for decision-making in urban rail transit autonomous operation is evaluated in sixteen different sections, and its effectiveness is demonstrated through an ablation experiment and comparison with the scheduled operation plan.
- C. Fu, M. A. Olivares-Mendez, R. Suarez-Fernandez, and P. Campoy, “Monocular visual-inertial slam-based collision avoidance strategy for fail-safe uav using fuzzy logic controllers: comparison of two cross-entropy optimization approaches,” Journal of Intelligent & Robotic Systems, vol. 73, pp. 513–533, 2014.
- M. Hörwick and K.-H. Siedersberger, “Strategy and architecture of a safety concept for fully automatic and autonomous driving assistance systems,” in 2010 IEEE Intelligent Vehicles Symposium. IEEE, 2010, pp. 955–960.
- G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “Openai gym,” 2016.
- E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for model-based control,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2012, pp. 5026–5033.
- B. Thananjeyan, A. Balakrishna, S. Nair, M. Luo, K. Srinivasan, M. Hwang, J. E. Gonzalez, J. Ibarz, C. Finn, and K. Goldberg, “Recovery rl: Safe reinforcement learning with learned recovery zones,” IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 4915–4922, 2021.
- D. Kaur, S. Uslu, K. J. Rittichier, and A. Durresi, “Trustworthy artificial intelligence: a review,” ACM Computing Surveys (CSUR), vol. 55, no. 2, pp. 1–38, 2022.
- H. Liu, Y. Wang, W. Fan, X. Liu, Y. Li, S. Jain, Y. Liu, A. Jain, and J. Tang, “Trustworthy ai: A computational perspective,” ACM Transactions on Intelligent Systems and Technology, vol. 14, no. 1, pp. 1–59, 2022.
- B. Li, P. Qi, B. Liu, S. Di, J. Liu, J. Pei, J. Yi, and B. Zhou, “Trustworthy ai: From principles to practices,” ACM Computing Surveys, vol. 55, no. 9, pp. 1–46, 2023.
- N. A. Smuha, “The eu approach to ethics guidelines for trustworthy artificial intelligence,” Computer Law Review International, vol. 20, no. 4, pp. 97–106, 2019.
- P. Howlett, “An optimal strategy for the control of a train,” The ANZIAM Journal, vol. 31, no. 4, pp. 454–471, 1990.
- E. Khmelnitsky, “On an optimal control problem of train operation,” IEEE transactions on automatic control, vol. 45, no. 7, pp. 1257–1266, 2000.
- A. Albrecht, P. Howlett, P. Pudney, X. Vu, and P. Zhou, “The key principles of optimal train control—part 2: Existence of an optimal strategy, the local energy minimization principle, uniqueness, computational techniques,” Transportation Research Part B: Methodological, vol. 94, pp. 509–538, 2016.
- ——, “The key principles of optimal train control—part 1: Formulation of the model, strategies of optimal type, evolutionary lines, location of optimal switching points,” Transportation Research Part B: Methodological, vol. 94, pp. 482–508, 2016.
- R. M. Goverde, G. M. Scheepmaker, and P. Wang, “Pseudospectral optimal train control,” European Journal of Operational Research, vol. 292, no. 1, pp. 353–375, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0377221720308948
- G. M. Scheepmaker, R. Goverde, and L. G. Kroon, “Review of energy-efficient train control and timetabling,” European Journal of Operational Research, vol. 257, no. 2017, p. 355–376, 2017.
- H. Ko, T. Koseki, and M. Miyatake, “Application of dynamic programming to the optimization of the running profile of a train,” WIT Transactions on The Built Environment, vol. 74, 2004.
- S. Lu, S. Hillmansen, T. K. Ho, and C. Roberts, “Single-train trajectory optimization,” IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 2, pp. 743–750, 2013.
- X. Wang, T. Tang, and H. He, “Optimal control of heavy haul train based on approximate dynamic programming,” Advances in Mechanical Engineering,9,4(2017-4-01), vol. 9, no. 4, p. 168781401769811, 2017.
- T. Liu, J. Xun, J. Yin, and X. Xiao, “Optimal train control by approximate dynamic programming: Comparison of three value function approximation methods,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2018, pp. 2741–2746.
- P. Wang, A. Trivella, R. Goverde, and F. Corman, “Train trajectory optimization for improved on-time arrival under parametric uncertainty,” Transportation Research Part C Emerging Technologies, vol. 119, p. 102680, 2020.
- L. Zhu, Y. He, F. R. Yu, B. Ning, T. Tang, and N. Zhao, “Communication-based train control system performance optimization using deep reinforcement learning,” IEEE Transactions on Vehicular Technology, vol. 66, no. 12, pp. 10 705–10 717, 2017.
- W. Liu, S. Su, T. Tang, and X. Wang, “A dqn-based intelligent control method for heavy haul trains on long steep downhill section,” Transportation Research Part C: Emerging Technologies, vol. 129, p. 103249, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0968090X2100262X
- ——, “A dqn-based intelligent control method for heavy haul trains on long steep downhill section,” Transportation Research Part C: Emerging Technologies, vol. 129, p. 103249, 2021.
- J. Yin, D. Chen, and L. Li, “Intelligent train operation algorithms for subway by expert system and reinforcement learning,” IEEE Transactions on Intelligent Transportation Systems, vol. 15, no. 6, pp. 2561–2571, 2014.
- K. Zhou, S. Song, A. Xue, K. You, and H. Wu, “Smart train operation algorithms based on expert knowledge and reinforcement learning,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 2, pp. 716–727, 2020.
- M. Shang, Y. Zhou, and H. Fujita, “Deep reinforcement learning with reference system to handle constraints for energy-efficient train control,” Information Sciences, vol. 570, pp. 708–721, 2021.
- C. J. Watkins and P. Dayan, “Q-learning,” Machine learning, vol. 8, pp. 279–292, 1992.
- V. Konda and J. Tsitsiklis, “Actor-critic algorithms,” Advances in neural information processing systems, vol. 12, 1999.
- T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
- T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in International conference on machine learning. PMLR, 2018, pp. 1861–1870.
- D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., “Mastering the game of go with deep neural networks and tree search,” nature, vol. 529, no. 7587, pp. 484–489, 2016.
- D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton et al., “Mastering the game of go without human knowledge,” nature, vol. 550, no. 7676, pp. 354–359, 2017.
- O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev et al., “Grandmaster level in starcraft ii using multi-agent reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, 2019.
- M. Alshiekh, R. Bloem, R. Ehlers, B. Könighofer, S. Niekum, and U. Topcu, “Safe reinforcement learning via shielding,” in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
- S. Gu, L. Yang, Y. Du, G. Chen, F. Walter, J. Wang, Y. Yang, and A. Knoll, “A review of safe reinforcement learning: Methods, theory and applications,” arXiv preprint arXiv:2205.10330, 2022.
- J. Garcıa and F. Fernández, “A comprehensive survey on safe reinforcement learning,” Journal of Machine Learning Research, vol. 16, no. 1, pp. 1437–1480, 2015.
- S. Gu, G. Chen, L. Zhang, J. Hou, Y. Hu, and A. Knoll, “Constrained reinforcement learning for vehicle motion planning with topological reachability analysis,” Robotics, vol. 11, no. 4, p. 81, 2022.
- L. Wen, J. Duan, S. E. Li, S. Xu, and H. Peng, “Safe reinforcement learning for autonomous vehicles through parallel constrained policy optimization,” in 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2020, pp. 1–7.
- R. Cheng, G. Orosz, R. M. Murray, and J. W. Burdick, “End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 3387–3395.
- S. Mo, X. Pei, and C. Wu, “Safe reinforcement learning for autonomous vehicle using monte carlo tree search,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 7, pp. 6766–6773, 2021.
- T.-H. Pham, G. De Magistris, and R. Tachibana, “Optlayer-practical constrained optimization for deep reinforcement learning in the real world,” in 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018, pp. 6236–6243.
- J. Garcia and F. Fernández, “Safe exploration of state and action spaces in reinforcement learning,” Journal of Artificial Intelligence Research, vol. 45, pp. 515–564, 2012.
- J. García and D. Shafie, “Teaching a humanoid robot to walk faster through safe reinforcement learning,” Engineering Applications of Artificial Intelligence, vol. 88, p. 103360, 2020.
- A. Singh, Y. Halpern, N. Thain, K. Christakopoulou, E. Chi, J. Chen, and A. Beutel, “Building healthy recommendation sequences for everyone: A safe reinforcement learning approach,” in FAccTRec Workshop, 2020.
- X. Lu, L. Xiao, G. Niu, X. Ji, and Q. Wang, “Safe exploration in wireless security: A safe reinforcement learning algorithm with hierarchical structure,” IEEE Transactions on Information Forensics and Security, vol. 17, pp. 732–743, 2022.
- F. Zhao, Y. Zeng, B. Han, H. Fang, and Z. Zhao, “Nature-inspired self-organizing collision avoidance for drone swarm based on reward-modulated spiking neural network,” Patterns, vol. 3, no. 11, 2022.
- Z. Zhao, J. Xun, X. Wen, and J. Chen, “Safe reinforcement learning for single train trajectory optimization via shield sarsa,” IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 1, pp. 412–428, 2022.
- P. S. Thomas, B. C. da Silva, A. G. Barto, S. Giguere, Y. Brun, and E. Brunskill, “Preventing undesirable behavior of intelligent machines,” Science, vol. 366, no. 6468, pp. 999–1004, 2019.
- H. Tang, Y. Wang, X. Liu, and X. Feng, “Reinforcement learning approach for optimal control of multiple electric locomotives in a heavy-haul freight train: A double-switch-q-network architecture,” Knowledge-Based Systems, vol. 190, p. 105173, 2020.
- X. Wang, A. D’Ariano, S. Su, and T. Tang, “Cooperative train control during the power supply shortage in metro system: A multi-agent reinforcement learning approach,” Transportation Research Part B: Methodological, vol. 170, pp. 244–278, 2023.
- X. Chen, X. Guo, J. Meng, R. Xu, S. Li, and D. Li, “Research on ato control method for urban rail based on deep reinforcement learning,” IEEE Access, vol. 11, pp. 5919–5928, 2023.
- L. Ning, M. Zhou, Z. Hou, R. M. Goverde, F.-Y. Wang, and H. Dong, “Deep deterministic policy gradient for high-speed train trajectory optimization,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 8, pp. 11 562–11 574, 2021.
- X. Lin, Z. Liang, L. Shen, F. Zhao, X. Liu, P. Sun, and T. Cao, “Reinforcement learning method for the multi-objective speed trajectory optimization of a freight train,” Control Engineering Practice, vol. 138, p. 105605, 2023.
- G. Li, S. W. Or, and K. W. Chan, “Intelligent energy-efficient train trajectory optimization approach based on supervised reinforcement learning for urban rail transits,” IEEE Access, vol. 11, pp. 31 508–31 521, 2023.
- L. Zhang, M. Zhou, Z. Li et al., “An intelligent train operation method based on event-driven deep reinforcement learning,” IEEE Transactions on Industrial Informatics, vol. 18, no. 10, pp. 6973–6980, 2021.
- S. Su, W. Liu, Q. Zhu, R. Li, T. Tang, and J. Lv, “A cooperative collision-avoidance control methodology for virtual coupling trains,” Accident Analysis & Prevention, vol. 173, p. 106703, 2022.
- J. Achiam, “Spinning Up in Deep Reinforcement Learning,” 2018.
- A. Pnueli, “The temporal logic of programs,” in 18th Annual Symposium on Foundations of Computer Science (sfcs 1977). ieee, 1977, pp. 46–57.
- O. Kupferman and M. Y. Vardi, “Model checking of safety properties,” Kluwer Academic Publishers, 2001.