Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HAIM-DRL: Enhanced Human-in-the-loop Reinforcement Learning for Safe and Efficient Autonomous Driving (2401.03160v5)

Published 6 Jan 2024 in cs.LG, cs.AI, and cs.RO

Abstract: Despite significant progress in autonomous vehicles (AVs), the development of driving policies that ensure both the safety of AVs and traffic flow efficiency has not yet been fully explored. In this paper, we propose an enhanced human-in-the-loop reinforcement learning method, termed the Human as AI mentor-based deep reinforcement learning (HAIM-DRL) framework, which facilitates safe and efficient autonomous driving in mixed traffic platoon. Drawing inspiration from the human learning process, we first introduce an innovative learning paradigm that effectively injects human intelligence into AI, termed Human as AI mentor (HAIM). In this paradigm, the human expert serves as a mentor to the AI agent. While allowing the agent to sufficiently explore uncertain environments, the human expert can take control in dangerous situations and demonstrate correct actions to avoid potential accidents. On the other hand, the agent could be guided to minimize traffic flow disturbance, thereby optimizing traffic flow efficiency. In detail, HAIM-DRL leverages data collected from free exploration and partial human demonstrations as its two training sources. Remarkably, we circumvent the intricate process of manually designing reward functions; instead, we directly derive proxy state-action values from partial human demonstrations to guide the agents' policy learning. Additionally, we employ a minimal intervention technique to reduce the human mentor's cognitive load. Comparative results show that HAIM-DRL outperforms traditional methods in driving safety, sampling efficiency, mitigation of traffic flow disturbance, and generalizability to unseen traffic scenarios. The code and demo videos for this paper can be accessed at: https://zilin-huang.github.io/HAIM-DRL-website/

Definition Search Book Streamline Icon: https://streamlinehq.com
References (83)
  1. Constrained policy optimization, in: Proceedings of the 34th International Conference on Machine Learning-Volume 70, JMLR. org. pp. 22–31.
  2. Investigating the effects of gradual deployment of market penetration rates (mpr) of connected vehicles on delay time and fuel consumption. Journal of Intelligent and Connected Vehicles 5, 188–198.
  3. Survey of deep reinforcement learning for motion planning of autonomous vehicles. IEEE Transactions on Intelligent Transportation Systems 23, 740–759.
  4. A framework for behavioural cloning., in: Machine Intelligence 15, pp. 103–129.
  5. Dynamical model of traffic congestion and numerical simulation. Physical review E 51, 1035.
  6. The perils of trial-and-error reward design: misdesign through overfitting and invalid task specifications, in: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 5920–5929.
  7. Mixed platoon control of automated and human-driven vehicles at a signalized intersection: dynamical analysis and optimal control. Transportation research part C: emerging technologies 127, 103138.
  8. Traffic dynamics under speed disturbance in mixed traffic with automated and non-automated vehicles. Transportation research part C: emerging technologies 113, 293–313.
  9. Graph neural network and reinforcement learning for multi-agent cooperative control of connected autonomous vehicles. Computer-Aided Civil and Infrastructure Engineering 36, 838–857.
  10. Deep reinforcement learning from human preferences. Advances in neural information processing systems 30.
  11. Exploring the limitations of behavior cloning for autonomous driving, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9329–9338.
  12. DI-drive: OpenDILab decision intelligence platform for autonomous driving simulation. https://github.com/opendilab/DI-drive.
  13. A survey on autonomous vehicle control in the era of mixed-autonomy: From physics-based to ai-guided driving policy learning. Transportation research part C: emerging technologies 125, 103008.
  14. An enhanced eco-driving strategy based on reinforcement learning for connected electric vehicles: cooperative velocity and lane-changing control. Journal of Intelligent and Connected Vehicles 5, 316–332.
  15. Space-weighted information fusion using deep reinforcement learning: The context of tactical control of lane-changing autonomous vehicles and connectivity range assessment. Transportation Research Part C: Emerging Technologies 128, 103192.
  16. Development and testing of an image transformer for explainable autonomous driving systems. Journal of Intelligent and Connected Vehicles 5, 235–249.
  17. Why did the ai make that decision? towards an explainable artificial intelligence (xai) for autonomous driving systems. Transportation research part C: emerging technologies 156, 104358.
  18. Carla: An open urban driving simulator, in: Conference on robot learning, PMLR. pp. 1–16.
  19. Dynamic urban traffic rerouting with fog-cloud reinforcement learning. Computer-Aided Civil and Infrastructure Engineering .
  20. Potential impact of autonomous vehicles in mixed traffic from simulation using real traffic flow. Journal of Intelligent and Connected Vehicles 6, 1–15.
  21. Dense reinforcement learning for safety validation of autonomous vehicles. Nature 615, 620–627.
  22. Leveraging vehicle connectivity and autonomy for highway bottleneck congestion mitigation using reinforcement learning. Transportmetrica A: Transport Science , 1–26.
  23. Learning to walk in the real world with minimal human effort, in: Conference on Robot Learning, PMLR. pp. 1110–1120.
  24. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, in: International conference on machine learning, PMLR. pp. 1861–1870.
  25. Leveraging reinforcement learning for dynamic traffic control: A survey and challenges for field implementation. Communications in Transportation Research 3, 100104.
  26. A physics-informed reinforcement learning-based strategy for local and coordinated ramp metering. Transportation Research Part C: Emerging Technologies 137, 103584.
  27. Generative adversarial imitation learning. Advances in neural information processing systems 29.
  28. Capturing drivers’ lane changing behaviors on operational level by data driven methods. IEEE Access 6, 57497–57506.
  29. Cv2x-loca: Roadside unit-enabled cooperative localization framework for autonomous vehicles. arXiv preprint arXiv:2304.00676 .
  30. Conditional predictive behavior planning with inverse reinforcement learning for human-like autonomous driving. IEEE Transactions on Intelligent Transportation Systems .
  31. Reward learning from human preferences and demonstrations in atari. Advances in neural information processing systems 31.
  32. Reinforcement learning based cooperative longitudinal control for reducing traffic oscillations and improving platoon stability. Transportation Research Part C: Emerging Technologies 141, 103744.
  33. Approximately optimal approximate reinforcement learning, in: Proceedings of the Nineteenth International Conference on Machine Learning, pp. 267–274.
  34. Hg-dagger: Interactive imitation learning with human experts, in: 2019 International Conference on Robotics and Automation (ICRA), IEEE. pp. 8077–8083.
  35. General lane-changing model mobil for car-following models. Transportation Research Record 1999, 86–94.
  36. Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems 23, 4909–4926.
  37. Reward (mis) design for autonomous driving. Artificial Intelligence 316, 103829.
  38. Socially situated artificial intelligence enables learning from human interaction. Proceedings of the National Academy of Sciences 119, e2115730119.
  39. Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems 33, 1179–1191.
  40. A survey on imitation learning techniques for end-to-end autonomous vehicles. IEEE Transactions on Intelligent Transportation Systems 23, 14128–14147.
  41. St-crmf: compensated residual matrix factorization with spatial-temporal regularization for graph-based time series forecasting. Sensors 22, 5877.
  42. Metadrive: Composing diverse driving scenarios for generalizable reinforcement learning. IEEE transactions on pattern analysis and machine intelligence 45, 3461–3475.
  43. Efficient learning of safe driving policy via human-ai copilot optimization. arXiv preprint arXiv:2202.10341 .
  44. Rllib: Abstractions for distributed reinforcement learning, in: International conference on machine learning, PMLR. pp. 3053–3062.
  45. Continuous control with deep reinforcement learning, in: Proceedings of the International Conference on Learning Representations.
  46. Rssi positioning method of vehicles in tunnels based on semi-supervised extreme learning machine. Journal of Traffic and Transportation Engineering 21, 243–255.
  47. Longitudinal control of connected and automated vehicles among signalized intersections in mixed traffic flow with deep reinforcement learning approach. Physica A: Statistical Mechanics and its Applications 629, 129189.
  48. Anisotropy safety potential field model under intelligent and connected vehicle environment and its application in car-following modeling. Journal of Intelligent and Connected Vehicles .
  49. Integration of automated vehicles in mixed traffic: Evaluating changes in performance of following human-driven vehicles. Accident Analysis & Prevention 152, 106006.
  50. Where to add actions in human-in-the-loop reinforcement learning, in: Proceedings of the AAAI Conference on Artificial Intelligence.
  51. Human-in-the-loop imitation learning using remote teleoperation. arXiv preprint arXiv:2012.06733 .
  52. Continuum modeling of freeway traffic flows: State-of-the-art, challenges and future directions in the era of connected and automated vehicles. Communications in Transportation Research 3, 100107.
  53. Deep learning for safe autonomous driving: Current challenges and future directions. IEEE Transactions on Intelligent Transportation Systems 22, 4316–4336.
  54. Overcoming exploration in reinforcement learning with demonstrations, in: 2018 IEEE international conference on robotics and automation (ICRA), IEEE. pp. 6292–6299.
  55. Future connected vehicles: Communications demands, privacy and cyber-security. Communications in Transportation Research 2, 100056.
  56. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35, 27730–27744.
  57. Safe driving via expert guided policy optimization, in: Conference on Robot Learning, PMLR. pp. 1554–1563.
  58. Learning from active human involvement through proxy value propagation, in: Thirty-seventh Conference on Neural Information Processing Systems.
  59. Envisioning the future of transportation: Inspiration of chatgpt and large models. Communications in Transportation Research 3, 100103.
  60. A reduction of imitation learning and structured prediction to no-regret online learning, in: Proceedings of the fourteenth international conference on artificial intelligence and statistics, JMLR Workshop and Conference Proceedings. pp. 627–635.
  61. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 .
  62. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 .
  63. Assessing traffic disturbance, efficiency, and safety of the mixed traffic flow of connected vehicles and traditional vehicles by considering human factors. Transportation research part C: emerging technologies 124, 102934.
  64. Epg-mgcn: Ego-planning guided multi-graph convolutional network for heterogeneous agent trajectory prediction. arXiv preprint arXiv:2303.17027 .
  65. A deep reinforcement learning based distributed control strategy for connected automated vehicles in mixed traffic platoon. Transportation Research Part C: Emerging Technologies 148, 104019.
  66. The effect of ride experience on changing opinions toward autonomous vehicle safety. Communications in transportation research 1, 100003.
  67. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362, 1140–1144.
  68. Quantifying air quality benefits resulting from few autonomous vehicles stabilizing traffic. Transportation Research Part D: Transport and Environment 67, 351–365.
  69. Responsive safety in reinforcement learning by pid lagrangian methods, in: International Conference on Machine Learning, PMLR. pp. 9133–9143.
  70. Congested traffic states in empirical observations and microscopic simulations. Physical review E 62, 1805.
  71. Gops: A general optimal control problem solver for autonomous driving and industrial control applications. Communications in Transportation Research 3, 100096.
  72. Safe decision-making for lane-change of autonomous vehicles via human demonstration-aided reinforcement learning, in: 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), IEEE. pp. 1228–1233.
  73. Toward human-in-the-loop ai: Enhancing deep reinforcement learning via real-time human guidance for autonomous driving. Engineering 21, 75–91.
  74. Intersection control with connected and automated vehicles: a review. Journal of intelligent and connected vehicles 5, 260–269.
  75. A combined deep learning method with attention-based lstm model for short-term traffic speed forecasting. Journal of Advanced Transportation 2020, 1–15.
  76. Dcl-aim: Decentralized coordination learning of autonomous intersection management for connected and automated vehicles. Transportation Research Part C: Emerging Technologies 103, 246–260.
  77. Agnp: Network-wide short-term probabilistic traffic speed prediction and imputation. Communications in Transportation Research 3, 100099.
  78. Effects of connected and autonomous vehicle merging behavior on mainline human-driven vehicle. Journal of Intelligent and Connected Vehicles 5, 36–45.
  79. Congestion-mitigating mpc design for adaptive cruise control based on newell’s car following model: History outperforms prediction. Transportation Research Part C: Emerging Technologies 142, 103801.
  80. Merging control strategies of connected and autonomous vehicles at freeway on-ramps: a comprehensive review. Journal of Intelligent and Connected Vehicles 5, 99–111.
  81. Safe, efficient, and comfortable velocity control based on reinforcement learning for autonomous driving. Transportation Research Part C: Emerging Technologies 117, 102662.
  82. A survey of deep rl and il for autonomous driving policy learning. IEEE Transactions on Intelligent Transportation Systems 23, 14043–14065.
  83. Evaluation of platooning configurations for connected and automated vehicles at an isolated roundabout in a mixed traffic environment. Journal of Intelligent and Connected Vehicles 6, 136–148.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zilin Huang (19 papers)
  2. Zihao Sheng (15 papers)
  3. Chengyuan Ma (20 papers)
  4. Sikai Chen (31 papers)
Citations (17)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com