MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation (1711.09017v1)

Published 24 Nov 2017 in cs.CV

Abstract: Learning-based methods are believed to work well for unconstrained gaze estimation, i.e. gaze estimation from a monocular RGB camera without assumptions regarding user, environment, or camera. However, current gaze datasets were collected under laboratory conditions and methods were not evaluated across multiple datasets. Our work makes three contributions towards addressing these limitations. First, we present the MPIIGaze that contains 213,659 full face images and corresponding ground-truth gaze positions collected from 15 users during everyday laptop use over several months. An experience sampling approach ensured continuous gaze and head poses and realistic variation in eye appearance and illumination. To facilitate cross-dataset evaluations, 37,667 images were manually annotated with eye corners, mouth corners, and pupil centres. Second, we present an extensive evaluation of state-of-the-art gaze estimation methods on three current datasets, including MPIIGaze. We study key challenges including target gaze range, illumination conditions, and facial appearance variation. We show that image resolution and the use of both eyes affect gaze estimation performance while head pose and pupil centre information are less informative. Finally, we propose GazeNet, the first deep appearance-based gaze estimation method. GazeNet improves the state of the art by 22% percent (from a mean error of 13.9 degrees to 10.8 degrees) for the most challenging cross-dataset evaluation.

Citations (416)

View on Semantic Scholar

Summary

The paper introduces the MPIIGaze dataset, a large-scale collection of eye images captured in everyday laptop use for robust real-world gaze estimation.
The proposed GazeNet deep CNN model outperforms state-of-the-art methods by reducing the mean error from 13.9 to 10.8 degrees in cross-dataset evaluations.
Comprehensive analysis reveals that variations in illumination, gaze range, and individual appearance significantly impact performance, guiding future method improvements.

Overview of MPIIGaze: A Real-World Dataset for Deep Appearance-Based Gaze Estimation

The paper "MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation" provides significant advancements in the field of gaze estimation within computer vision. The authors present a novel approach to unconstrained gaze estimation, highlighting its fundamental importance and challenges in realistic settings, unlike laboratory-based conditions typically explored in previous studies.

The critical contributions of this research are multifaceted. Firstly, the introduction of the MPIIGaze dataset marks a substantial step forward. This dataset comprises 213,659 images collected from 15 participants over several months during everyday laptop use, representing a realistic variation in eye appearance and illumination conditions. The extensive dataset allows for robust cross-dataset evaluations, representing different real-world environments under which typical gaze estimation methods perform inadequately due to their limited scope of training conditions.

Secondly, the authors conduct comprehensive evaluations of the proposed GazeNet model alongside established state-of-the-art methods across three datasets, including MPIIGaze itself. This evaluation addresses core challenges such as variable gaze ranges, illumination conditions, and individual facial appearance differences, which are essential for effective gaze estimation in unconstrained settings. Remarkably, GazeNet, a deep convolutional neural network model, demonstrates an impressive performance by outperforming prior methods by 22% in the depicted cross-dataset contexts, reducing the mean error from 13.9 to 10.8 degrees on the most challenging datasets.

In-depth analyses identify key hurdles in gaze estimation, emphasizing the critical influence of variations in training and testing conditions. Differences in gaze ranges across datasets contribute to a 25% performance gap, while varying illumination accounts for a 35% gap, and personal appearance results in a 40% gap. These findings underscore the necessity of accounting for such variabilities in synthetic data training or through innovative modeling strategies.

The research additionally explores several related factors affecting gaze estimation. It was determined that the resolution of input images influences the model's accuracy, with lower resolution resulting in degraded performance. Using information from both eyes, rather than a single eye, can enhance accuracy, validating the potential of including binocular cues in gaze estimation. The paper also addresses the possible impact of head pose information on estimation performance, although its usefulness seems marginal compared to eye appearance data. Furthermore, the integration of pupil center information as input shows limited performance enhancement, highlighting potential directions for method improvements.

The implications for this research are far-reaching in both theoretical and practical realms. Unconstrained gaze estimation has many applications, from eye-tracking innovations for human-computer interaction to studying user intent and visual attention analysis in everyday environments. The MPIIGaze dataset could serve as a benchmark for future research, potentially leading to methodologies robust enough for practical deployment in consumer devices equipped with simple monocular RGB cameras.

Forward-looking, the grand challenge remains to develop adaptable methods that maintain accuracy across diverse environments and individuals without extensive domain-specific re-training. Future research may delve into synthetic data augmentation, explore multimodal sensor integration, or adopt advanced transfer learning strategies to create versatile and deployable gaze estimation models capable of handling the complexity of real-world settings.

PDF Markdown

MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation (1711.09017v1)

Summary

Overview of MPIIGaze: A Real-World Dataset for Deep Appearance-Based Gaze Estimation

Related Papers