Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploring Contextual Representation and Multi-Modality for End-to-End Autonomous Driving (2210.06758v2)

Published 13 Oct 2022 in cs.RO and cs.LG

Abstract: Learning contextual and spatial environmental representations enhances autonomous vehicle's hazard anticipation and decision-making in complex scenarios. Recent perception systems enhance spatial understanding with sensor fusion but often lack full environmental context. Humans, when driving, naturally employ neural maps that integrate various factors such as historical data, situational subtleties, and behavioral predictions of other road users to form a rich contextual understanding of their surroundings. This neural map-based comprehension is integral to making informed decisions on the road. In contrast, even with their significant advancements, autonomous systems have yet to fully harness this depth of human-like contextual understanding. Motivated by this, our work draws inspiration from human driving patterns and seeks to formalize the sensor fusion approach within an end-to-end autonomous driving framework. We introduce a framework that integrates three cameras (left, right, and center) to emulate the human field of view, coupled with top-down bird-eye-view semantic data to enhance contextual representation. The sensor data is fused and encoded using a self-attention mechanism, leading to an auto-regressive waypoint prediction module. We treat feature representation as a sequential problem, employing a vision transformer to distill the contextual interplay between sensor modalities. The efficacy of the proposed method is experimentally evaluated in both open and closed-loop settings. Our method achieves displacement error by 0.67m in open-loop settings, surpassing current methods by 6.9% on the nuScenes dataset. In closed-loop evaluations on CARLA's Town05 Long and Longest6 benchmarks, the proposed method enhances driving performance, route completion, and reduces infractions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. System, design and experimental validation of autonomous vehicle in an unconstrained environment. Sensors 20, 5999.
  2. Label efficient visual abstractions for autonomous driving, in: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE. pp. 2338–2345.
  3. End-to-end object detection with transformers, in: European conference on computer vision, Springer. pp. 213–229.
  4. Learning from all vehicles, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17222–17231.
  5. Learning by cheating, in: Conference on Robot Learning, PMLR. pp. 66–75.
  6. Neat: Neural attention fields for end-to-end autonomous driving, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15793–15803.
  7. Exploring the limitations of behavior cloning for autonomous driving, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9329–9338.
  8. Image transformer for explainable autonomous driving system, in: 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), IEEE. pp. 2732–2737.
  9. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 .
  10. Safe local motion planning with self-supervised freespace forecasting, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12732–12741.
  11. St-p3: End-to-end vision-based autonomous driving via spatial-temporal feature learning, in: European Conference on Computer Vision, Springer. pp. 533–549.
  12. Planning-oriented autonomous driving, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17853–17862.
  13. Multi-modal sensor fusion-based deep neural network for end-to-end autonomous driving with scene understanding. IEEE Sensors Journal 21, 11781–11790.
  14. Multi-modal motion prediction with transformer-based neural network for autonomous driving, in: 2022 International Conference on Robotics and Automation (ICRA), IEEE. pp. 2605–2611.
  15. Think twice before driving: Towards scalable decoders for end-to-end autonomous driving, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21983–21994.
  16. Vad: Vectorized scene representation for efficient autonomous driving. arXiv preprint arXiv:2303.12077 .
  17. Level-5 autonomous driving—are we there yet? a review of research literature. ACM Computing Surveys (CSUR) 55, 1–38.
  18. End-to-end contextual perception and prediction with interaction transformer, in: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE. pp. 5784–5791.
  19. End-to-end autonomous driving with semantic depth cloud mapping and multi-agent. IEEE Transactions on Intelligent Vehicles 8, 557–571.
  20. Multi-modal fusion transformer for end-to-end autonomous driving, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7077–7087.
  21. Planning and decision-making for autonomous vehicles. Annual Review of Control, Robotics, and Autonomous Systems 1, 187–210.
  22. Safety-enhanced autonomous driving using interpretable sensor fusion transformer. arXiv preprint arXiv:2207.14024 .
  23. End-to-end multi-modal sensors fusion system for urban automated driving .
  24. Attention is all you need. Advances in neural information processing systems 30.
  25. Multimodal end-to-end autonomous driving. IEEE Transactions on Intelligent Transportation Systems 23, 537–547.
  26. Adahessian: An adaptive second order optimizer for machine learning, in: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 10665–10673.
  27. A survey of autonomous driving: Common practices and emerging technologies. IEEE access 8, 58443–58469.
  28. End-to-end interpretable neural motion planner, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8660–8669.
  29. Coaching a teachable student, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7805–7815.
  30. End-to-end urban driving by imitating a reinforcement learning coach, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15222–15232.
  31. Does computer vision matter for action? Science Robotics 4, eaaw6661.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Shoaib Azam (14 papers)
  2. Farzeen Munir (16 papers)
  3. Ville Kyrki (102 papers)
  4. Moongu Jeon (43 papers)
  5. Witold Pedrycz (67 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.