DME-Driver: Integrating Human Decision Logic and 3D Scene Perception in Autonomous Driving (2401.03641v1)
Abstract: In the field of autonomous driving, two important features of autonomous driving car systems are the explainability of decision logic and the accuracy of environmental perception. This paper introduces DME-Driver, a new autonomous driving system that enhances the performance and reliability of autonomous driving system. DME-Driver utilizes a powerful vision LLM as the decision-maker and a planning-oriented perception model as the control signal generator. To ensure explainable and reliable driving decisions, the logical decision-maker is constructed based on a large vision LLM. This model follows the logic employed by experienced human drivers and makes decisions in a similar manner. On the other hand, the generation of accurate control signals relies on precise and detailed environmental perception, which is where 3D scene perception models excel. Therefore, a planning oriented perception model is employed as the signal generator. It translates the logical decisions made by the decision-maker into accurate control signals for the self-driving cars. To effectively train the proposed model, a new dataset for autonomous driving was created. This dataset encompasses a diverse range of human driver behaviors and their underlying motivations. By leveraging this dataset, our model achieves high-precision planning accuracy through a logical thinking process.
- Self-driving cars: A survey. Expert Systems with Applications, 165:113816, 2021.
- Frozen in time: A joint video and image encoder for end-to-end retrieval. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1728–1738, 2021.
- Find your own way: Weakly-supervised segmentation of path proposals for urban autonomy. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 203–210. IEEE, 2017.
- On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 610–623, 2021.
- End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316, 2016.
- Rt-2: Vision-language-action models transfer web knowledge to robotic control. arXiv preprint arXiv:2307.15818, 2023.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631, 2020.
- Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3558–3568, 2021.
- Deepdriving: Learning affordance for direct perception in autonomous driving. In Proceedings of the IEEE international conference on computer vision, pages 2722–2730, 2015.
- End-to-end autonomous driving: Challenges and frontiers. arXiv preprint arXiv:2306.16927, 2023.
- Language-guided 3d object detection in point cloud for autonomous driving. arXiv preprint arXiv:2305.15765, 2023.
- Lu Chi and Yadong Mu. Deep steering: Learning end-to-end driving model from spatial and temporal visual cues. arXiv preprint arXiv:1708.03798, 2017.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Carla: An open urban driving simulator. In Conference on robot learning, pages 1–16. PMLR, 2017.
- Semantic anomaly detection with large language models. Autonomous Robots, pages 1–21, 2023.
- Towards safe autonomous driving: Capture uncertainty in the deep neural network for lidar 3d vehicle detection. In 2018 21st international conference on intelligent transportation systems (ITSC), pages 3266–3273. IEEE, 2018.
- Trajectory planning for automated parking using multi-resolution state roadmap considering non-holonomic constraints. In 2014 IEEE Intelligent Vehicles Symposium Proceedings, pages 407–413. IEEE, 2014.
- A survey of deep learning techniques for autonomous driving. Journal of Field Robotics, 37(3):362–386, 2020.
- Vip3d: End-to-end visual trajectory prediction via 3d agent queries. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5496–5506, 2023.
- Human-like decision making for autonomous driving: A noncooperative game theoretic approach. IEEE Transactions on Intelligent Transportation Systems, 22(4):2076–2087, 2020.
- Safe local motion planning with self-supervised freespace forecasting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12732–12741, 2021.
- St-p3: End-to-end vision-based autonomous driving via spatial-temporal feature learning. In European Conference on Computer Vision, pages 533–549. Springer, 2022a.
- Goal-oriented autonomous driving. arXiv preprint arXiv:2212.10156, 2022b.
- Planning-oriented autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17853–17862, 2023.
- Differentiable integrated motion prediction and planning with learnable cost function for autonomous driving. IEEE transactions on neural networks and learning systems, 2023.
- Adding navigation to the equation: Turning decisions for end-to-end vehicle control. In 2017 IEEE 20th international conference on intelligent transportation systems (ITSC), pages 1–8. IEEE, 2017.
- An empirical evaluation of deep learning on highway driving. arXiv preprint arXiv:1504.01716, 2015.
- Navigating occluded intersections with autonomous vehicles using deep reinforcement learning. In 2018 IEEE international conference on robotics and automation (ICRA), pages 2034–2039. IEEE, 2018.
- A new path: Scaling vision-and-language navigation with synthetic instructions and imitation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10813–10823, 2023.
- Look both ways: Self-supervising driver gaze estimation and road scene saliency. In European Conference on Computer Vision, pages 126–142. Springer, 2022.
- Differentiable raycasting for self-supervised occupancy forecasting. In European Conference on Computer Vision, pages 353–369. Springer, 2022.
- Advisable learning for self-driving vehicles by internalizing observation-to-action rules. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9661–9670, 2020.
- Visualization of driving behavior based on hidden feature extraction by using deep learning. IEEE Transactions on Intelligent Transportation Systems, 18(9):2477–2489, 2017.
- Improved baselines with visual instruction tuning, 2023a.
- Vlpd: Context-aware pedestrian detection via vision-language semantic self-supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6662–6671, 2023b.
- Localization and navigation in autonomous driving: Threats and countermeasures. IEEE Wireless Communications, 26(4):38–45, 2019.
- Gpt-driver: Learning to drive with gpt. arXiv preprint arXiv:2310.01415, 2023.
- Event-based vision meets deep learning on steering prediction for self-driving cars. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5419–5427, 2018.
- The stanford entry in the urban challenge. Journal of Field Robotics, 7(9):468–492, 2008.
- Stereo-camera-based urban environment perception using occupancy grid and object tracking. IEEE Transactions on Intelligent Transportation Systems, 13(1):154–165, 2011.
- Openscene: 3d scene understanding with open vocabularies. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 815–824, 2023.
- All weather perception: Joint data association, tracking, and classification for autonomous ground vehicles. arXiv preprint arXiv:1605.02196, 2016.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
- Planning and decision-making for autonomous vehicles. Annual Review of Control, Robotics, and Autonomous Systems, 1:187–210, 2018.
- Safety-enhanced autonomous driving using interpretable sensor fusion transformer. In Conference on Robot Learning, pages 726–737. PMLR, 2023.
- Talk to the vehicle: Language conditioned autonomous navigation of self driving cars. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5284–5290. IEEE, 2019.
- Energy and policy considerations for deep learning in nlp. arXiv preprint arXiv:1906.02243, 2019.
- Scalability in perception for autonomous driving: Waymo open dataset. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2446–2454, 2020.
- Stanley: The robot that won the darpa grand challenge. Journal of field Robotics, 23(9):661–692, 2006.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline. Advances in Neural Information Processing Systems, 35:6119–6132, 2022.
- Drivegpt4: Interpretable end-to-end autonomous driving via large language model. arXiv preprint arXiv:2310.01412, 2023.
- SS Yakovlev and Arkady N Borisov. A synergy of the rosenblatt perceptron and the jordan recurrence principle. Automatic Control and Computer Sciences, 43:31–39, 2009.
- The dawn of lmms: Preliminary explorations with gpt-4v (ision). arXiv preprint arXiv:2309.17421, 9(1), 2023.
- Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
- End-to-end interpretable neural motion planner. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8660–8669, 2019.