Gaze360: Physically Unconstrained Gaze Estimation in the Wild (1910.10088v1)

Published 22 Oct 2019 in cs.CV

Abstract: Understanding where people are looking is an informative social cue. In this work, we present Gaze360, a large-scale gaze-tracking dataset and method for robust 3D gaze estimation in unconstrained images. Our dataset consists of 238 subjects in indoor and outdoor environments with labelled 3D gaze across a wide range of head poses and distances. It is the largest publicly available dataset of its kind by both subject and variety, made possible by a simple and efficient collection method. Our proposed 3D gaze model extends existing models to include temporal information and to directly output an estimate of gaze uncertainty. We demonstrate the benefits of our model via an ablation study, and show its generalization performance via a cross-dataset evaluation against other recent gaze benchmark datasets. We furthermore propose a simple self-supervised approach to improve cross-dataset domain adaptation. Finally, we demonstrate an application of our model for estimating customer attention in a supermarket setting. Our dataset and models are available at http://gaze360.csail.mit.edu .

Citations (298)

View on Semantic Scholar

Summary

The paper introduces the largest publicly available 3D gaze dataset, enabling robust gaze estimation in diverse, unconstrained settings.
It employs a bidirectional LSTM with a novel regression method to integrate temporal data and directly quantify prediction uncertainty.
Extensive ablation and cross-dataset evaluations confirm the model’s superior generalizability and practical utility in applications like retail analytics.

Insights from the Gaze360 Project: Advancements in 3D Gaze Estimation

The paper "Gaze360: Physically Unconstrained Gaze Estimation in the Wild" presents a detailed account of a novel dataset and approach for 3D gaze estimation using unconstrained images. This research aims to address the gap in gaze estimation performance by providing a significant contribution in terms of both dataset creation and model enhancement.

Overview of Contributions

Key contributions of the Gaze360 project include the creation of the largest publicly available dataset for 3D gaze estimation to date. This dataset comprises video frames of 238 subjects recorded in both indoor and outdoor environments, capturing a wide range of head poses, distances, and gaze directions. The collection method leverages a 360-degree camera to capture data efficiently, allowing for comprehensive gaze mapping without restrictive laboratory settings.

The research proposes an advanced 3D gaze model built upon existing techniques, incorporating temporal information and direct uncertainty estimates into gaze prediction. Through a multi-frame input approach, the model mitigates single-frame ambiguities, which enables robust performance across diverse scenes. Moreover, the paper includes an ablation analysis to highlight benefits, cross-dataset evaluations to exhibit the model's generalization capabilities, and a simple self-supervised domain adaptation technique enhancing performance in new contexts.

Methodological Advancements

The dataset acquisition method is notably efficient, utilizing a Ladybug panoramic camera and a simple moving target for gaze fixation. The design not only gathers substantial data in various natural lighting conditions but also spans a broad demographic range. The approach effectively captures the full spectrum of head and eyeball orientations, providing training data representative of real-world scenarios.

The proposed model integrates a bidirectional LSTM network that processes multiple frames of video to enhance prediction accuracy and incorporate temporal dependencies. This is complemented by a novel application of regression with pinball loss to quantify the estimation uncertainty, offering a predictive insight into the reliability of each gaze inference alongside the inference itself.

Experimental Results

The quantitative assessments reveal that the multi-frame-based models consistently outperform single-frame predictors on the Gaze360 dataset. Importantly, the paper underscores the superior performance of the Gaze360 model in cross-dataset evaluations, employing datasets like Columbia, MPIIFaceGaze, and RT-GENE to validate the generalizability of the model. The results demonstrate the effectiveness of both the dataset's diversity and the model's design in accommodating unseen domain shifts.

Practical Implications and Future Directions

The practical applications of the research are exemplified by its deployment for customer attention tracking in retail environments. This demonstrates the model's potential utility in commercial settings where understanding consumer behavior is of interest. As a future direction, the integration of the Gaze360 model into interactive systems such as robotics or surveillance could provide valuable insights into human attention and interaction patterns, broadening the applicability of gaze estimation technologies.

Additionally, with the release of both the dataset and model, the project facilitates further research and improvement in the field of gaze estimation. The openness of these resources is likely to spur new developments and innovations, as researchers can build upon and adapt them for varied applications and unforeseeable use cases.

In conclusion, the Gaze360 paper delivers a substantial advancement in the field of gaze estimation through its comprehensive dataset, sophisticated model, and insightful evaluations, paving the way for enhanced human-computer interaction systems capable of nuanced understanding through gaze analysis.

PDF Markdown