Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning (2402.13243v1)

Published 20 Feb 2024 in cs.CV and cs.RO

Abstract: Learning a human-like driving policy from large-scale driving demonstrations is promising, but the uncertainty and non-deterministic nature of planning make it challenging. In this work, to cope with the uncertainty problem, we propose VADv2, an end-to-end driving model based on probabilistic planning. VADv2 takes multi-view image sequences as input in a streaming manner, transforms sensor data into environmental token embeddings, outputs the probabilistic distribution of action, and samples one action to control the vehicle. Only with camera sensors, VADv2 achieves state-of-the-art closed-loop performance on the CARLA Town05 benchmark, significantly outperforming all existing methods. It runs stably in a fully end-to-end manner, even without the rule-based wrapper. Closed-loop demos are presented at https://hgao-cv.github.io/VADv2.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  2. Mp3: A unified model to map, perceive, predict and plan. In CVPR, 2021.
  3. Multipath: Multiple probabilistic anchor trajectory hypotheses for behavior prediction. arXiv preprint arXiv:1910.05449, 2019.
  4. Gri: General reinforced imitation and its application to vision-based autonomous driving. arXiv preprint arXiv:2111.08575, 2021.
  5. Learning to drive from a world on rails. In ICCV, 2021.
  6. Learning by cheating. 2020.
  7. Driving with llms: Fusing object-level vector modality for explainable autonomous driving. arXiv preprint arXiv:2310.01957, 2023.
  8. Exploring the limitations of behavior cloning for autonomous driving. In ICCV, 2019.
  9. Exploring the limitations of behavior cloning for autonomous driving. 2019.
  10. Hilm-d: Towards high-resolution understanding in multimodal large language models for autonomous driving. arXiv preprint arXiv:2309.05186, 2023.
  11. Carla: An open urban driving simulator. In Conference on robot learning, pages 1–16. PMLR, 2017.
  12. Drive like a human: Rethinking autonomous driving with large language models. arXiv preprint arXiv:2307.07162, 2023.
  13. Vectornet: Encoding hd maps and agent dynamics from vectorized representation. In CVPR, 2020.
  14. Vip3d: End-to-end visual trajectory prediction via 3d agent queries. arXiv preprint arXiv:2208.01582, 2022.
  15. Openstreetmap: User-generated street maps. IEEE Pervasive computing, 2008.
  16. Model-based imitation learning for urban driving. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  17. Fiery: Future instance prediction in bird’s-eye view from surround monocular cameras. In ICCV, 2021.
  18. St-p3: End-to-end vision-based autonomous driving via spatial-temporal feature learning. In ECCV, 2022.
  19. Planning-oriented autonomous driving. CVPR2023, 2022.
  20. Driveadapter: Breaking the coupling barrier of perception and planning in end-to-end autonomous driving. 2023.
  21. Think twice before driving: Towards scalable decoders for end-to-end autonomous driving. In CVPR, 2023.
  22. Perceive, interact, predict: Learning dynamic and static clues for end-to-end motion prediction. arXiv preprint arXiv:2212.02181, 2022.
  23. Vad: Vectorized scene representation for efficient autonomous driving. ICCV, 2023.
  24. Hdmapnet: An online hd map construction and evaluation framework. In ICRA, 2022.
  25. Bevdepth: Acquisition of reliable depth for multi-view 3d object detection. arXiv preprint arXiv:2206.10092, 2022.
  26. Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. arXiv preprint arXiv:2203.17270, 2022.
  27. Learning lane graph representations for motion forecasting. In ECCV, 2020.
  28. Lane graph as path: Continuity-preserving path-wise modeling for online lane graph construction. arXiv preprint arXiv:2303.08815, 2023.
  29. Maptr: Structured modeling and learning for online vectorized hd map construction. arXiv preprint arXiv:2208.14437, 2022.
  30. Maptrv2: An end-to-end framework for online vectorized hd map construction. arXiv preprint arXiv:2308.05736, 2023.
  31. Mtd-gpt: A multi-task decision-making gpt model for autonomous driving at unsignalized intersections. arXiv preprint arXiv:2307.16118, 2023.
  32. Vectormapnet: End-to-end vectorized hd map learning. arXiv preprint arXiv:2206.08920, 2022.
  33. Multimodal motion prediction with stacked transformers. In CVPR, 2021.
  34. Gpt-driver: Learning to drive with gpt. arXiv preprint arXiv:2310.01415, 2023.
  35. Nerf: Representing scenes as neural radiance fields for view synthesis. ECCV, 2020.
  36. Scene transformer: A unified architecture for predicting multiple agent trajectories. arXiv preprint arXiv:2106.08417, 2021.
  37. Covernet: Multimodal behavior prediction using trajectory sets. In CVPR, 2020.
  38. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. In ECCV, 2020.
  39. Dean A Pomerleau. Alvinn: An autonomous land vehicle in a neural network. NeurIPS, 1988.
  40. Multi-modal fusion transformer for end-to-end autonomous driving. 2021.
  41. Multi-modal fusion transformer for end-to-end autonomous driving. In CVPR, 2021.
  42. Improving language understanding by generative pre-training. 2018.
  43. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  44. Languagempc: Large language models as decision makers for autonomous driving. arXiv preprint arXiv:2310.03026, 2023.
  45. Safety-enhanced autonomous driving using interpretable sensor fusion transformer. In Conference on Robot Learning, pages 726–737. PMLR, 2023.
  46. End-to-end model-free reinforcement learning for urban driving using implicit affordances. In CVPR, 2020.
  47. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  48. Exploring object-centric temporal modeling for efficient multi-view 3d object detection. arXiv preprint arXiv:2303.11926, 2023.
  49. Drivemlm: Aligning multi-modal large language models with behavioral planning states for autonomous driving. arXiv preprint arXiv:2312.09245, 2023.
  50. Dilu: A knowledge-driven approach to autonomous driving with large language models. arXiv preprint arXiv:2309.16292, 2023.
  51. Drivegpt4: Interpretable end-to-end autonomous driving via large language model. arXiv preprint arXiv:2310.01412, 2023.
  52. Bevformer v2: Adapting modern image backbones to bird’s-eye-view recognition via perspective supervision. arXiv preprint arXiv:2211.10439, 2022.
  53. Beverse: Unified perception and prediction in birds-eye-view for vision-centric autonomous driving. arXiv preprint arXiv:2205.09743, 2022.
  54. End-to-end urban driving by imitating a reinforcement learning coach. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
Citations (17)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com