Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Guiding Attention in End-to-End Driving Models (2405.00242v1)

Published 30 Apr 2024 in cs.CV and cs.AI

Abstract: Vision-based end-to-end driving models trained by imitation learning can lead to affordable solutions for autonomous driving. However, training these well-performing models usually requires a huge amount of data, while still lacking explicit and intuitive activation maps to reveal the inner workings of these models while driving. In this paper, we study how to guide the attention of these models to improve their driving quality and obtain more intuitive activation maps by adding a loss term during training using salient semantic maps. In contrast to previous work, our method does not require these salient semantic maps to be available during testing time, as well as removing the need to modify the model's architecture to which it is applied. We perform tests using perfect and noisy salient semantic maps with encouraging results in both, the latter of which is inspired by possible errors encountered with real data. Using CIL++ as a representative state-of-the-art model and the CARLA simulator with its standard benchmarks, we conduct experiments that show the effectiveness of our method in training better autonomous driving models, especially when data and computational resources are scarce.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. Layer normalization. arXiv:1607.06450, 2016.
  2. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473, 2014.
  3. Stephen Lehmkuhle Barbara A. Steinman, Scott B. Steinman. Visual attention mechanisms show a center—surround organization. In Vision Research, pages 1859–1869, 1995.
  4. Deep imitation learning for playing real time strategy games. Cs229 Stanf. Edu, 2019.
  5. Attention augmented convolutional networks. In International Conference on Computer Vision (ICCV), 2019.
  6. Xuelai Du Ce Zhang, Azim Eskandarian. Attention-based neural network for driving environment complexity perception. In Intelligent Transportation Systems Conference (ITSC), 2021.
  7. End-to-end autonomous driving: Challenges and frontiers. arXiv preprint arXiv:2306.16927, 2023.
  8. Show, adapt and tell: Adversarial training of cross-domain image captioner. In International Conference on Computer Vision (ICCV), 2017.
  9. NEAT: Neural attention fields for end-to-end autonomous driving. In International Conference on Computer Vision (ICCV), 2021.
  10. Transfuser: Imitation with transformer-based sensor fusion for autonomous driving. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2022.
  11. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv:1406.1078, 2014.
  12. End-to-end driving via conditional imitation learning. In International Conference on Robotics and Automation (ICRA), 2018.
  13. Exploring the limitations of behavior cloning for autonomous driving. In International Conference on Computer Vision (ICCV), 2019.
  14. Explaining autonomous driving with visual attention and end-to-end trainable region proposals. Journal of Ambient Intelligence Humanized Computing, 2023.
  15. ImageNet: A large-scale hierarchical image database. In International Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
  16. Bert: Pre-training of deep bidirectional transformers for language understanding, 2018.
  17. Saliendet: A saliency-based feature enhancement algorithm for object detection for autonomous driving. IEEE Trans. on Intelligent Vehicles, 2023.
  18. CARLA: An open urban driving simulator. In Conference on Robot Learning (CoRL), 2017.
  19. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representation (ICLR), 2021.
  20. Aldo Aldo Faisal. Predicting visual attention of human drivers boosts the training speed and performance of autonomous vehicles. Journal of Vision, 21:2819, 2021.
  21. Performing aerobatic maneuver with imitation learning. In International Conference on Computational Science, 2023.
  22. Dual attention network for scene segmentation. In International Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  23. Imitation learning for end to end vehicle longitudinal control with forward camera. In Neural Information Processing Systems (NIPS) Imitation Learning WS, 2018.
  24. Pct: Point cloud transformer. Computational Visual Media, 7:187–199, 2021.
  25. Attention mechanisms in computer vision: A survey. Computational visual media, 8(3):331–368, 2022.
  26. Co-training for unsupervised domain adaptation of semantic segmentation models. Sensors, 23:621, 2023.
  27. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2015.
  28. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  29. MIC: Masked image consistency for context-enhanced domain adaptation. In International Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  30. Model-based imitation learning for urban driving. In Neural Information Processing Systems (NeurIPS), 2022.
  31. Squeeze-and-excitation networks. In International Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  32. Multi-task learning with attention for end-to-end autonomous driving. In International Conference on Computer Vision and Pattern Recognition (CVPR) Workshop on Autonomous Driving, 2021.
  33. Spatial transformer networks. Advances in neural information processing systems, 28, 2015.
  34. Shail: Safety-aware hierarchical adversarial imitation learning for autonomous driving in urban environments. In International Conference on Robotics and Automation (ICRA), 2023.
  35. Learning to play guess who? and inventing a grounded language as a consequence. arXiv:1611.03218, 2016.
  36. Training robots without robots: deep imitation learning for master-to-robot policy transfer. IEEE Robotics and Automation Letters, 8(5):2906–2913, 2023.
  37. CIRL: Controllable imitative reinforcement learning for vision-based self-driving. In European Conf. on Computer Vision (ECCV), 2018.
  38. Grace W. Lindsay. Attention in psychology, neuroscience, and machine learning. Frontiers in Computational Neuroscience, 14, 2020.
  39. Swin transformer: Hierarchical vision transformer using shifted windows. In International Conference on Computer Vision (ICCV), 2021.
  40. Knowing when to look: Adaptive attention via a visual sentinel for image captioning. In International Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  41. Effective approaches to attention-based neural machine translation. arXiv:1508.04025, 2015.
  42. Human visual attention prediction boosts learning & performance of autonomous driving agents. arXiv preprint arXiv:1909.05003, 2019.
  43. Recurrent models of visual attention. Advances in neural information processing systems, 27, 2014.
  44. Agile autonomous driving using end-to-end deep imitation learning. In Robotics: Science and Systems (RSS), 2018.
  45. Ken Perlin. Improving noise. In Proceedings of the 29th annual conference on Computer graphics and interactive techniques, 2002.
  46. U2-net: Going deeper with nested u-structure for salient object detection. page 107404, 2020.
  47. Imitation learning for locomotion and manipulation. In International Conference on Humanoid Robots (HUMANOIDS), 2007.
  48. Self-critical sequence training for image captioning. In International Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  49. Learning to fly. In Machine Learning Proceedings 1992, pages 385–393. 1992.
  50. Advances in neural information processing systems. In Neural Information Processing Systems (NIPS), 2020.
  51. Selective eye-gaze augmentation to enhance imitation learning in atari games. Neural Computing and Applications, 35(32):23401–23410, 2023.
  52. Learning to drive from simulation without real world labels. Bewley, alex and rigley, jessica and liu, yuxuan and hawke, jeffrey and shen, richard and lam, vinh-dieu and kendall, alex. In International Conference on Robotics and Automation (ICRA), 2019.
  53. Attention is all you need. In Neural Information Processing Systems (NeurIPS), 2017.
  54. End-to-end self-driving using deep neural networks with multi-auxiliary tasks. Automotive Innovation, 2, 2019.
  55. Residual attention network for image classification. In International Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  56. Cbam: Convolutional block attention module. In European Conf. on Computer Vision (ECCV), pages 3–19, 2018.
  57. Periphery-fovea multi-resolution driving model guided by human attention. In Winter conf. on Applications of Computer Vision (WACV), 2020.
  58. Multimodal end-to-end autonomous driving. IEEE Trans. on Intelligent Transportation Systems, 23(1):537–547, 2020.
  59. Scaling vision-based end-to-end autonomous driving with multi-view attention learning. In International Conference on Intelligent Robots and Systems (IROS), 2023.
  60. Hierarchical attention networks for document classification. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, 2016.
  61. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32, 2019.
  62. End-to-end urban driving by imitating a reinforcement learning coach. In International Conference on Computer Vision (ICCV), 2021.

Summary

We haven't generated a summary for this paper yet.