Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DME-Driver: Integrating Human Decision Logic and 3D Scene Perception in Autonomous Driving (2401.03641v1)

Published 8 Jan 2024 in cs.RO and cs.CV

Abstract: In the field of autonomous driving, two important features of autonomous driving car systems are the explainability of decision logic and the accuracy of environmental perception. This paper introduces DME-Driver, a new autonomous driving system that enhances the performance and reliability of autonomous driving system. DME-Driver utilizes a powerful vision LLM as the decision-maker and a planning-oriented perception model as the control signal generator. To ensure explainable and reliable driving decisions, the logical decision-maker is constructed based on a large vision LLM. This model follows the logic employed by experienced human drivers and makes decisions in a similar manner. On the other hand, the generation of accurate control signals relies on precise and detailed environmental perception, which is where 3D scene perception models excel. Therefore, a planning oriented perception model is employed as the signal generator. It translates the logical decisions made by the decision-maker into accurate control signals for the self-driving cars. To effectively train the proposed model, a new dataset for autonomous driving was created. This dataset encompasses a diverse range of human driver behaviors and their underlying motivations. By leveraging this dataset, our model achieves high-precision planning accuracy through a logical thinking process.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. Self-driving cars: A survey. Expert Systems with Applications, 165:113816, 2021.
  2. Frozen in time: A joint video and image encoder for end-to-end retrieval. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1728–1738, 2021.
  3. Find your own way: Weakly-supervised segmentation of path proposals for urban autonomy. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 203–210. IEEE, 2017.
  4. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 610–623, 2021.
  5. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316, 2016.
  6. Rt-2: Vision-language-action models transfer web knowledge to robotic control. arXiv preprint arXiv:2307.15818, 2023.
  7. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  8. nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631, 2020.
  9. Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3558–3568, 2021.
  10. Deepdriving: Learning affordance for direct perception in autonomous driving. In Proceedings of the IEEE international conference on computer vision, pages 2722–2730, 2015.
  11. End-to-end autonomous driving: Challenges and frontiers. arXiv preprint arXiv:2306.16927, 2023.
  12. Language-guided 3d object detection in point cloud for autonomous driving. arXiv preprint arXiv:2305.15765, 2023.
  13. Lu Chi and Yadong Mu. Deep steering: Learning end-to-end driving model from spatial and temporal visual cues. arXiv preprint arXiv:1708.03798, 2017.
  14. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  15. Carla: An open urban driving simulator. In Conference on robot learning, pages 1–16. PMLR, 2017.
  16. Semantic anomaly detection with large language models. Autonomous Robots, pages 1–21, 2023.
  17. Towards safe autonomous driving: Capture uncertainty in the deep neural network for lidar 3d vehicle detection. In 2018 21st international conference on intelligent transportation systems (ITSC), pages 3266–3273. IEEE, 2018.
  18. Trajectory planning for automated parking using multi-resolution state roadmap considering non-holonomic constraints. In 2014 IEEE Intelligent Vehicles Symposium Proceedings, pages 407–413. IEEE, 2014.
  19. A survey of deep learning techniques for autonomous driving. Journal of Field Robotics, 37(3):362–386, 2020.
  20. Vip3d: End-to-end visual trajectory prediction via 3d agent queries. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5496–5506, 2023.
  21. Human-like decision making for autonomous driving: A noncooperative game theoretic approach. IEEE Transactions on Intelligent Transportation Systems, 22(4):2076–2087, 2020.
  22. Safe local motion planning with self-supervised freespace forecasting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12732–12741, 2021.
  23. St-p3: End-to-end vision-based autonomous driving via spatial-temporal feature learning. In European Conference on Computer Vision, pages 533–549. Springer, 2022a.
  24. Goal-oriented autonomous driving. arXiv preprint arXiv:2212.10156, 2022b.
  25. Planning-oriented autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17853–17862, 2023.
  26. Differentiable integrated motion prediction and planning with learnable cost function for autonomous driving. IEEE transactions on neural networks and learning systems, 2023.
  27. Adding navigation to the equation: Turning decisions for end-to-end vehicle control. In 2017 IEEE 20th international conference on intelligent transportation systems (ITSC), pages 1–8. IEEE, 2017.
  28. An empirical evaluation of deep learning on highway driving. arXiv preprint arXiv:1504.01716, 2015.
  29. Navigating occluded intersections with autonomous vehicles using deep reinforcement learning. In 2018 IEEE international conference on robotics and automation (ICRA), pages 2034–2039. IEEE, 2018.
  30. A new path: Scaling vision-and-language navigation with synthetic instructions and imitation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10813–10823, 2023.
  31. Look both ways: Self-supervising driver gaze estimation and road scene saliency. In European Conference on Computer Vision, pages 126–142. Springer, 2022.
  32. Differentiable raycasting for self-supervised occupancy forecasting. In European Conference on Computer Vision, pages 353–369. Springer, 2022.
  33. Advisable learning for self-driving vehicles by internalizing observation-to-action rules. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9661–9670, 2020.
  34. Visualization of driving behavior based on hidden feature extraction by using deep learning. IEEE Transactions on Intelligent Transportation Systems, 18(9):2477–2489, 2017.
  35. Improved baselines with visual instruction tuning, 2023a.
  36. Vlpd: Context-aware pedestrian detection via vision-language semantic self-supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6662–6671, 2023b.
  37. Localization and navigation in autonomous driving: Threats and countermeasures. IEEE Wireless Communications, 26(4):38–45, 2019.
  38. Gpt-driver: Learning to drive with gpt. arXiv preprint arXiv:2310.01415, 2023.
  39. Event-based vision meets deep learning on steering prediction for self-driving cars. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5419–5427, 2018.
  40. The stanford entry in the urban challenge. Journal of Field Robotics, 7(9):468–492, 2008.
  41. Stereo-camera-based urban environment perception using occupancy grid and object tracking. IEEE Transactions on Intelligent Transportation Systems, 13(1):154–165, 2011.
  42. Openscene: 3d scene understanding with open vocabularies. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 815–824, 2023.
  43. All weather perception: Joint data association, tracking, and classification for autonomous ground vehicles. arXiv preprint arXiv:1605.02196, 2016.
  44. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  45. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  46. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
  47. Planning and decision-making for autonomous vehicles. Annual Review of Control, Robotics, and Autonomous Systems, 1:187–210, 2018.
  48. Safety-enhanced autonomous driving using interpretable sensor fusion transformer. In Conference on Robot Learning, pages 726–737. PMLR, 2023.
  49. Talk to the vehicle: Language conditioned autonomous navigation of self driving cars. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5284–5290. IEEE, 2019.
  50. Energy and policy considerations for deep learning in nlp. arXiv preprint arXiv:1906.02243, 2019.
  51. Scalability in perception for autonomous driving: Waymo open dataset. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2446–2454, 2020.
  52. Stanley: The robot that won the darpa grand challenge. Journal of field Robotics, 23(9):661–692, 2006.
  53. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  54. Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline. Advances in Neural Information Processing Systems, 35:6119–6132, 2022.
  55. Drivegpt4: Interpretable end-to-end autonomous driving via large language model. arXiv preprint arXiv:2310.01412, 2023.
  56. SS Yakovlev and Arkady N Borisov. A synergy of the rosenblatt perceptron and the jordan recurrence principle. Automatic Control and Computer Sciences, 43:31–39, 2009.
  57. The dawn of lmms: Preliminary explorations with gpt-4v (ision). arXiv preprint arXiv:2309.17421, 9(1), 2023.
  58. Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  59. End-to-end interpretable neural motion planner. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8660–8669, 2019.
Citations (20)

Summary

We haven't generated a summary for this paper yet.