Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BEVGPT: Generative Pre-trained Large Model for Autonomous Driving Prediction, Decision-Making, and Planning (2310.10357v1)

Published 16 Oct 2023 in cs.RO

Abstract: Prediction, decision-making, and motion planning are essential for autonomous driving. In most contemporary works, they are considered as individual modules or combined into a multi-task learning paradigm with a shared backbone but separate task heads. However, we argue that they should be integrated into a comprehensive framework. Although several recent approaches follow this scheme, they suffer from complicated input representations and redundant framework designs. More importantly, they can not make long-term predictions about future driving scenarios. To address these issues, we rethink the necessity of each module in an autonomous driving task and incorporate only the required modules into a minimalist autonomous driving framework. We propose BEVGPT, a generative pre-trained large model that integrates driving scenario prediction, decision-making, and motion planning. The model takes the bird's-eye-view (BEV) images as the only input source and makes driving decisions based on surrounding traffic scenarios. To ensure driving trajectory feasibility and smoothness, we develop an optimization-based motion planning method. We instantiate BEVGPT on Lyft Level 5 Dataset and use Woven Planet L5Kit for realistic driving simulation. The effectiveness and robustness of the proposed framework are verified by the fact that it outperforms previous methods in 100% decision-making metrics and 66% motion planning metrics. Furthermore, the ability of our framework to accurately generate BEV images over the long term is demonstrated through the task of driving scenario prediction. To the best of our knowledge, this is the first generative pre-trained large model for autonomous driving prediction, decision-making, and motion planning with only BEV images as input.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. StretchBEV: Stretching Future Instance Prediction Spatially and Temporally. arXiv:2203.13641.
  2. Chauffeurnet: Learning to drive by imitating the best and synthesizing the worst. arXiv preprint arXiv:1812.03079.
  3. LIDAR-based driving path generation using fully convolutional neural networks. In 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), 1–6. IEEE.
  4. Deepdriving: Learning affordance for direct perception in autonomous driving. In Proceedings of the IEEE international conference on computer vision, 2722–2730.
  5. Learning from All Vehicles. arXiv:2203.11934.
  6. End-to-end driving via conditional imitation learning. In 2018 IEEE international conference on robotics and automation (ICRA), 4693–4700. IEEE.
  7. Voxel r-cnn: Towards high performance voxel-based 3d object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 1201–1209.
  8. Epsilon: An efficient planning system for automated vehicles in highly interactive environments. IEEE Transactions on Robotics, 38(2): 1118–1138.
  9. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778.
  10. One thousand and one hours: Self-driving motion prediction dataset. In Conference on Robot Learning, 409–418. PMLR.
  11. Fiery: Future instance prediction in bird’s-eye view from surround monocular cameras. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 15273–15282.
  12. Planning-oriented autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 17853–17862.
  13. Bevdet: High-performance multi-camera 3d object detection in bird-eye-view. arXiv preprint arXiv:2112.11790.
  14. Adding navigation to the equation: Turning decisions for end-to-end vehicle control. In 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), 1–8.
  15. PowerBEV: A Powerful Yet Lightweight Framework for Instance Prediction in Bird’s-Eye View. arXiv:2306.10761.
  16. Bevdepth: Acquisition of reliable depth for multi-view 3d object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 1477–1485.
  17. Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. In European conference on computer vision, 1–18. Springer.
  18. Effective adaptation in multi-task co-training for unified autonomous driving. Advances in Neural Information Processing Systems, 35: 19645–19658.
  19. Differential flatness of mechanical control systems: A catalog of prototype systems. In ASME international mechanical engineering congress and exposition. Citeseer.
  20. Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D. In Proceedings of the European Conference on Computer Vision.
  21. Improving language understanding by generative pre-training.
  22. Ruder, S. 2017. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098.
  23. Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 10529–10538.
  24. Machine learning method to ensure robust decision-making of AVs. In 2019 IEEE Intelligent Transportation Systems Conference (ITSC), 1217–1222. IEEE.
  25. A machine learning approach for personalized autonomous lane change initiation and control. In 2017 IEEE Intelligent vehicles symposium (IV), 1590–1595. IEEE.
  26. Differential flatness and absolute equivalence of nonlinear control systems. SIAM Journal on Control and Optimization, 36(4): 1225–1239.
  27. Game Theoretic Planning for Self-Driving Cars in Competitive Scenarios. In Robotics: Science and Systems, 1–9.
  28. Object dgcnn: 3d object detection using dynamic graphs. Advances in Neural Information Processing Systems, 34: 20745–20758.
  29. A behavioral planning framework for autonomous driving. In 2014 IEEE Intelligent Vehicles Symposium Proceedings, 458–464. IEEE.
  30. Large-scale cost function learning for path planning using deep inverse reinforcement learning. The International Journal of Robotics Research, 36(10): 1073–1087.
  31. Center-based 3d object detection and tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 11784–11793.
  32. Spatially-partitioned environmental representation and planning architecture for on-road autonomous driving. In 2017 IEEE Intelligent Vehicles Symposium (IV), 632–639. IEEE.
  33. Efficient Uncertainty-aware Decision-making for Automated Driving Using Guided Branching. In 2020 IEEE International Conference on Robotics and Automation (ICRA), 3291–3297.
  34. Beverse: Unified perception and prediction in birds-eye-view for vision-centric autonomous driving. arXiv preprint arXiv:2205.09743.
  35. Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, 4490–4499.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Pengqin Wang (5 papers)
  2. Meixin Zhu (39 papers)
  3. Hongliang Lu (72 papers)
  4. Hui Zhong (21 papers)
  5. Xianda Chen (14 papers)
  6. Shaojie Shen (121 papers)
  7. Xuesong Wang (44 papers)
  8. Yinhai Wang (45 papers)
Citations (11)

Summary

We haven't generated a summary for this paper yet.