Human Visual Attention Prediction Boosts Learning & Performance of Autonomous Driving Agents (1909.05003v1)

Published 11 Sep 2019 in cs.CV and cs.RO

Abstract: Autonomous driving is a multi-task problem requiring a deep understanding of the visual environment. End-to-end autonomous systems have attracted increasing interest as a method of learning to drive without exhaustively programming behaviours for different driving scenarios. When humans drive, they rely on a finely tuned sensory system which enables them to quickly acquire the information they need while filtering unnecessary details. This ability to identify task-specific high-interest regions within an image could be beneficial to autonomous driving agents and machine learning systems in general. To create a system capable of imitating human gaze patterns and visual attention, we collect eye movement data from human drivers in a virtual reality environment. We use this data to train deep neural networks predicting where humans are most likely to look when driving. We then use the outputs of this trained network to selectively mask driving images using a variety of masking techniques. Finally, autonomous driving agents are trained using these masked images as input. Upon comparison, we found that a dual-branch architecture which processes both raw and attention-masked images substantially outperforms all other models, reducing error in control signal predictions by 25.5\% compared to a standard end-to-end model trained only on raw images.

PDF Abstract

Overview of Human Visual Attention Prediction to Enhance Autonomous Driving Agent Learning

This research paper explores the intersection of human visual attention mechanisms and machine learning to improve the training of autonomous driving systems. The paper is driven by the insight that human drivers rely on a refined visual system to selectively identify task-relevant regions within a visual scene while disregarding irrelevant details. This selective attention capability is leveraged to enhance the performance of end-to-end autonomous driving models by incorporating an artificial attention mechanism that mimics human drivers' gaze patterns.

Approach and Methodology

Central to this research is the simulation environment setup using the CARLA driving simulator, augmented with virtual reality (VR) to emulate a realistic driving experience. Eye movement data from human drivers navigating this environment is recorded and used to train a deep neural network capable of predicting human gaze fixations. The model, named Intention-Branched DR(eye)VE, extends existing saliency prediction architectures by integrating a component that accounts for high-level driving intentions, thereby aligning attention predictions more closely with driver intent.

The research sets up a comparative framework, evaluating multiple saliency models, including DeepGaze II, MLNet, RMDN, and DR(eye)VE, on the task of human gaze prediction in driving scenarios. Quantitative measures such as Kullback-Leibler Divergence and Correlation Coefficient are employed to assess performance, with the Intention-Branched DR(eye)VE model exhibiting superior predictive capabilities.

Autonomous Agent Training

The collected gaze predictions are integrated into the training of autonomous driving agents through various attention-masking techniques. These techniques involve modifying input images to emphasize regions identified by the gaze prediction model. Several agent architectures are trained: a standard end-to-end imitation learning agent, variants receiving masked inputs (hard, soft, and baseline masks), and a dual-branch model receiving both raw and attention-enhanced inputs.

The dual-branch architecture emerges as notably effective, yielding a 25.5% reduction in mean absolute error over the raw image-trained model. These findings underscore the utility of incorporating artificial human-like attention mechanisms into agent training, particularly when a robust architectural design is in place to leverage attention-weighted information.

Implications and Future Directions

This work highlights the potential of integrating human cognitive models into machine learning frameworks to improve task-specific performance in complex environments like autonomous driving. The results suggest that attention-driven data augmentation can facilitate more efficient learning by aiding models in focusing computational resources on relevant sub-regions of input data. In practical applications, these enhancements could translate into increased reliability and safety of autonomous driving systems, better equipped to interpret and react to dynamic driving conditions.

The success of this approach opens the door to further exploration of attention models in diverse machine learning contexts. Extending these methodologies to reinforcement learning paradigms presents an intriguing frontier, as attention mechanisms could mitigate sample inefficiency constraints by directing learning focus to critical event-triggering features in the environment.

In conclusion, this paper provides a methodologically sound and experimentally validated case for leveraging human visual attention dynamics to augment the training of autonomous agents. As machine learning technologies continue to evolve, the incorporation of human-like perception and decision-making processes remains a promising avenue for enhancing the intelligence and safety of autonomous systems.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Alexander Makrigiorgos (1 paper)
Ali Shafti (17 papers)
Alex Harston (2 papers)
Julien Gerard (1 paper)
A. Aldo Faisal (39 papers)

Citations (14)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos