Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving? (2312.03031v2)

Published 5 Dec 2023 in cs.CV

Abstract: End-to-end autonomous driving recently emerged as a promising research direction to target autonomy from a full-stack perspective. Along this line, many of the latest works follow an open-loop evaluation setting on nuScenes to study the planning behavior. In this paper, we delve deeper into the problem by conducting thorough analyses and demystifying more devils in the details. We initially observed that the nuScenes dataset, characterized by relatively simple driving scenarios, leads to an under-utilization of perception information in end-to-end models incorporating ego status, such as the ego vehicle's velocity. These models tend to rely predominantly on the ego vehicle's status for future path planning. Beyond the limitations of the dataset, we also note that current metrics do not comprehensively assess the planning quality, leading to potentially biased conclusions drawn from existing benchmarks. To address this issue, we introduce a new metric to evaluate whether the predicted trajectories adhere to the road. We further propose a simple baseline able to achieve competitive results without relying on perception annotations. Given the current limitations on the benchmark and metrics, we suggest the community reassess relevant prevailing research and be cautious whether the continued pursuit of state-of-the-art would yield convincing and universal conclusions. Code and models are available at \url{https://github.com/NVlabs/BEV-Planner}

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316, 2016.
  2. nuscenes: A multimodal dataset for autonomous driving. In CVPR, 2020.
  3. Mp3: A unified model to map, perceive, predict and plan. In CVPR, 2021.
  4. Persformer: 3d lane detection via perspective transformer and the openlane benchmark. In European Conference on Computer Vision, pages 550–567. Springer, 2022.
  5. End-to-end autonomous driving: Challenges and frontiers. arXiv preprint arXiv:2306.16927, 2023.
  6. A review of motion planning for highway autonomous driving. IEEE Transactions on Intelligent Transportation Systems, 21(5):1826–1848, 2019.
  7. Practical search techniques in path planning for autonomous driving. Ann Arbor, 1001(48105):18–80, 2008.
  8. Carla: An open urban driving simulator. 2017.
  9. Vip3d: End-to-end visual trajectory prediction via 3d agent queries. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5496–5506, 2023.
  10. Deep residual learning for image recognition. In CVPR, 2016.
  11. Safe local motion planning with self-supervised freespace forecasting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12732–12741, 2021.
  12. St-p3: End-to-end vision-based autonomous driving via spatial-temporal feature learning. In European Conference on Computer Vision, pages 533–549. Springer, 2022.
  13. Planning-oriented autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17853–17862, 2023.
  14. BEVDet4D: Exploit temporal cues in multi-camera 3d object detection. arXiv preprint arXiv:2203.17054, 2022.
  15. Leveraging vision-centric multi-modal expertise for 3d object detection. arXiv preprint arXiv:2310.15670, 2023.
  16. Vad: Vectorized scene representation for efficient autonomous driving. arXiv preprint arXiv:2303.12077, 2023.
  17. Differentiable raycasting for self-supervised occupancy forecasting. In European Conference on Computer Vision, pages 353–369. Springer, 2022.
  18. Delving into the devils of bird’s-eye-view perception: A review, evaluation and recipe. arXiv preprint arXiv:2209.05324, 2022a.
  19. Metadrive: Composing diverse driving scenarios for generalizable reinforcement learning. IEEE transactions on pattern analysis and machine intelligence, 45(3):3461–3475, 2022b.
  20. Hdmapnet: An online hd map construction and evaluation framework. 2022c.
  21. BEVDepth: Acquisition of reliable depth for multi-view 3d object detection. arXiv preprint arXiv:2206.10092, 2022d.
  22. BEVFormer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. arXiv preprint arXiv:2203.17270, 2022e.
  23. Fb-occ: 3d occupancy prediction based on forward-backward view transformation. arXiv preprint arXiv:2307.01492, 2023a.
  24. Fb-bev: Bev representation from forward-backward view transformations. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6919–6928, 2023b.
  25. Maptr: Structured modeling and learning for online vectorized hd map construction. arXiv preprint arXiv:2208.14437, 2022.
  26. Sparse4d v2: Recurrent temporal fusion with sparse model. arXiv preprint arXiv:2305.14018, 2023.
  27. Sparsebev: High-performance sparse 3d object detection from multi-camera videos. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 18580–18590, 2023a.
  28. PETRv2: A unified framework for 3d perception from multi-camera images. arXiv preprint arXiv:2206.01256, 2022.
  29. Vectormapnet: End-to-end vectorized hd map learning. In International Conference on Machine Learning, pages 22352–22369. PMLR, 2023b.
  30. Time will tell: New outlooks and a baseline for temporal multi-view 3d object detection. arXiv preprint arXiv:2210.02443, 2022.
  31. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. In ECCV, 2020.
  32. Multi-modal fusion transformer for end-to-end autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7077–7087, 2021.
  33. Perceive, predict, and plan: Safe motion planning through interpretable semantic representations. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIII 16, pages 414–430. Springer, 2020.
  34. Safety-enhanced autonomous driving using interpretable sensor fusion transformer. In Conference on Robot Learning, pages 726–737. PMLR, 2023.
  35. A survey of end-to-end driving: Architectures and training methods. IEEE Transactions on Neural Networks and Learning Systems, 33(4):1364–1384, 2020.
  36. Attention is all you need. In NeurIPS, 2017.
  37. Interfusion: Interaction-based 4d radar and lidar fusion for 3d object detection. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 12247–12253. IEEE, 2022.
  38. Bev-lanedet: An efficient 3d lane detection based on virtual camera via key-points. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1002–1011, 2023a.
  39. Exploring object-centric temporal modeling for efficient multi-view 3d object detection. arXiv preprint arXiv:2303.11926, 2023b.
  40. Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline. Advances in Neural Information Processing Systems, 35:6119–6132, 2022.
  41. M22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTBEV: Multi-camera joint 3d detection and segmentation with unified birds-eye view representation. arXiv preprint arXiv:2204.05088, 2022.
  42. Bevformer v2: Adapting modern image backbones to bird’s-eye-view recognition via perspective supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17830–17839, 2023.
  43. Fusionad: Multi-modality fusion for prediction and planning tasks of autonomous driving. arXiv preprint arXiv:2308.01006, 2023.
  44. Streammapnet: Streaming mapping network for vectorized online hd map construction. arXiv preprint arXiv:2308.12570, 2023.
  45. Rethinking the open-loop evaluation of end-to-end autonomous driving in nuscenes. arXiv preprint arXiv:2305.10430, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Zhiqi Li (42 papers)
  2. Zhiding Yu (94 papers)
  3. Shiyi Lan (38 papers)
  4. Jiahan Li (25 papers)
  5. Jan Kautz (215 papers)
  6. Tong Lu (85 papers)
  7. Jose M. Alvarez (90 papers)
Citations (36)

Summary

Summary of "Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?"

The paper in question explores the concept of end-to-end autonomous driving, particularly focusing on open-loop evaluation methods. This research investigates the extent to which ego status — encompassing velocity, acceleration, and yaw angle of the ego vehicle — can suffice for planning decisions in such systems, when evaluated with the existing benchmark datasets and metrics.

The research highlights several key observations:

  1. Dataset Limitations: The nuScenes dataset, prevalent in the field, includes predominantly straightforward driving scenarios (approximately 73.9%). This skewed distribution can lead to models overly relying on ego status as a determinant for decision-making due to the simplicity of scenarios involved.
  2. Ego Status Dominance: The paper observes that ego status, which contains information on the vehicle's current state, plays a dominating role in open-loop planning evaluations. Leveraging only ego status with a simple MLP network can yield performance on par with more complex methods that incorporate comprehensive sensory information. This raises critical questions about the validity and depth of current planning benchmarks.
  3. Metric Shortcomings: Existing metrics, such as L2 distance and collision rate, fail to thoroughly assess a model's planning quality. The authors introduce intersection rates with road boundaries as a new metric, which shifts the evaluation landscape and highlights differences not captured by prior metrics. This new evaluation exposes cases where methods relying heavily on ego status might fall short.
  4. Continued Exploration in Intervention-Free Environments: The suggested models and benchmarks call into question the sole reliance on open-loop evaluation due to the lack of feedback from dynamic environments that change in response to the ego vehicle’s actions.

The implications of these findings suggest significant caution when interpreting state-of-the-art results in open-loop end-to-end autonomous driving research. Researchers should reconsider the reliance on constrained datasets and metrics, recognizing the inherent biases they may impose. As a practical implication, it points to the potential oversight in real-world applicability and safety of systems evaluated solely on such benchmarks.

Further, the paper prompts the research community to scrutinize whether striving for superior open-loop performance without addressing these limitations could lead to misplaced confidence in the predictive power of these systems. It argues for a broader, more comprehensive suite of benchmarks, possibly integrating closed-loop settings, to better capture the dynamic and multifaceted nature of autonomous driving tasks.

Overall, the paper challenges the community to envisage more balanced datasets and evaluation criteria, reconstructing the foundation on which future advances should be built, with more holistic and representative coverage of the varied challenges faced in real-world driving scenarios.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub