ETH-XGaze: A Large Scale Dataset for Gaze Estimation under Extreme Head Pose and Gaze Variation (2007.15837v1)

Published 31 Jul 2020 in cs.CV

Abstract: Gaze estimation is a fundamental task in many applications of computer vision, human computer interaction and robotics. Many state-of-the-art methods are trained and tested on custom datasets, making comparison across methods challenging. Furthermore, existing gaze estimation datasets have limited head pose and gaze variations, and the evaluations are conducted using different protocols and metrics. In this paper, we propose a new gaze estimation dataset called ETH-XGaze, consisting of over one million high-resolution images of varying gaze under extreme head poses. We collect this dataset from 110 participants with a custom hardware setup including 18 digital SLR cameras and adjustable illumination conditions, and a calibrated system to record ground truth gaze targets. We show that our dataset can significantly improve the robustness of gaze estimation methods across different head poses and gaze angles. Additionally, we define a standardized experimental protocol and evaluation metric on ETH-XGaze, to better unify gaze estimation research going forward. The dataset and benchmark website are available at https://ait.ethz.ch/projects/2020/ETH-XGaze

Citations (213)

View on Semantic Scholar

Summary

The paper introduces ETH-XGaze by offering over one million high-resolution images that capture extreme head pose and gaze variations to enhance model robustness.
The paper employs a controlled, multi-camera setup with standardized lighting and evaluation protocols to ensure precise and diverse gaze labeling.
The paper demonstrates that training models on this comprehensive dataset significantly improves generalization across varied real-world conditions.

Analysis of "ETH-XGaze: A Large Scale Dataset for Gaze Estimation under Extreme Head Pose and Gaze Variation"

The paper presents ETH-XGaze, a comprehensive dataset designed to address the challenges in gaze estimation research, particularly under extreme conditions of head pose and gaze variation. The authors emphasize the limitations of existing datasets, which typically do not accommodate a wide range of head poses and gaze angles. ETH-XGaze stands out by offering more than one million high-resolution images, collected systematically to span a diverse range of conditions and variations.

Dataset Overview

ETH-XGaze is constructed from images of 110 participants, captured using a setup involving 18 DSLR cameras, calibrated to provide accurate gaze labels via a large screen. This dataset excels by providing:

Extensive variation: Head poses range up to ±70° and gaze directions up to ±50°, surpassing the coverage of prior datasets.
High resolution: Images at 6000×4000 pixels enhance the dataset's suitability for high-quality gaze estimation and adjacent fields like photorealistic eye modeling.
Controlled conditions: Standardized lighting and viewpoint variations offer a robust platform for developing resilient gaze estimation models.
Diverse participant representation: Ensures a broad spectrum of ethnicities and personal characteristics, enhancing the dataset's applicability to real-world scenarios.

Contributions to Gaze Estimation

Robustness Enhancement: The dataset's extensive coverage of varying head poses and gaze angles facilitates training models that perform robustly across different conditions. The benchmark evaluations in the paper highlight ETH-XGaze's superiority in enhancing model robustness, compared to other gaze estimation datasets.
Evaluation Protocols: The authors propose standardized protocols for evaluation, including cross-dataset, within-dataset, and person-specific evaluations, as well as a novel robustness evaluation focused on head and gaze variations. These protocols aim to unify gaze estimation research and provide a baseline for future comparisons.
Baseline Gaze Estimation: The paper describes a baseline method leveraging ResNet-50, illustrating the utility of ETH-XGaze in both within-dataset and cross-dataset settings. Through detailed testing, they demonstrate the benefits of training on a comprehensive dataset in improving model accuracy and generalization.

Implications and Future Directions

The implications of ETH-XGaze are substantial for both theoretical and practical advancements in gaze estimation. The dataset provides a basis for the development of models that need to handle extreme variations in human gaze and head poses, crucial for applications in autonomous vehicles, human-computer interaction, and smart environments. The high-resolution data facilitates explorations into generative modeling of eye regions and synthetic data generation, potentially impacting fields like computer graphics and virtual reality.

In future research, ETH-XGaze could serve as a cornerstone for developing gaze estimation techniques that are less reliant on frontal-view data, thus broadening the applicability of these models. Furthermore, leveraging its high-resolution imagery opens avenues for integrating gaze estimation into complex systems that require nuanced understanding of human attention dynamics.

In conclusion, ETH-XGaze marks a significant stride in gaze estimation research by addressing existing limitations through comprehensive data coverage and a standardized evaluation framework. Its contributions promise to inspire subsequent advances, making it an indispensable resource for the gaze estimation community.