Eye Tracking for Everyone (1606.05814v1)

Published 18 Jun 2016 in cs.CV

Abstract: From scientific research to commercial applications, eye tracking is an important tool across many domains. Despite its range of applications, eye tracking has yet to become a pervasive technology. We believe that we can put the power of eye tracking in everyone's palm by building eye tracking software that works on commodity hardware such as mobile phones and tablets, without the need for additional sensors or devices. We tackle this problem by introducing GazeCapture, the first large-scale dataset for eye tracking, containing data from over 1450 people consisting of almost 2.5M frames. Using GazeCapture, we train iTracker, a convolutional neural network for eye tracking, which achieves a significant reduction in error over previous approaches while running in real time (10-15fps) on a modern mobile device. Our model achieves a prediction error of 1.71cm and 2.53cm without calibration on mobile phones and tablets respectively. With calibration, this is reduced to 1.34cm and 2.12cm. Further, we demonstrate that the features learned by iTracker generalize well to other datasets, achieving state-of-the-art results. The code, data, and models are available at http://gazecapture.csail.mit.edu.

Citations (800)

View on Semantic Scholar

Summary

The paper introduces a large-scale GazeCapture dataset and iTracker CNN for real-time, accurate eye tracking on common mobile devices.
The methodology leverages crowdsourcing to collect 2.5 million frames from 1,474 diverse participants, ensuring robustness and generalizability.
The paper achieves gaze prediction errors as low as 1.34 cm with calibration, demonstrating practical impact in enhancing human-computer interaction.

Eye Tracking for Everyone: An Insightful Overview

The paper, “Eye Tracking for Everyone,” authored by Kyle Krafka et al., addresses an essential but under-explored aspect of eye tracking: its accessibility and usability on commodity hardware like smartphones and tablets. This work provides significant contributions to the field of computer vision and human-computer interaction (HCI) by proposing GazeCapture, the first large-scale eye tracking dataset for mobile devices, and iTracker, a convolutional neural network (CNN) specifically designed for gaze prediction.

GazeCapture Dataset

GazeCapture stands out in several key aspects:

Scalability and Diversity: Utilizing crowdsourcing, the dataset includes data from 1,474 individuals, resulting in 2.5 million frames. This crowdsourced approach is unique and ensures a broad variety of users, enhancing the dataset's robustness and generalizability.
Quality and Reliability: To guarantee high-quality data, the authors implemented several mechanisms within their iOS application. These mechanisms include ensuring participants fixate on target dots and using real-time face detection to confirm visibility of the face throughout recording.
Rich Variability: The dataset encompasses different head poses, diverse illumination conditions, and variable backgrounds. This diversity is crucial for training models that need to perform accurately in real-world, dynamic environments.

iTracker: CNN for Gaze Prediction

Using the GazeCapture dataset, the authors developed iTracker, an end-to-end CNN that excels in predicting gaze direction:

Model Architecture: iTracker processes inputs from the images of both eyes and the entire face, along with a face grid. This architecture allows the network to infer both the head pose and the eye's orientation to predict gaze accurately.
Training and Performance: The model showcases impressive results, achieving prediction errors of 1.71 cm on mobile phones and 2.53 cm on tablets without calibration. With calibration, these errors reduce to 1.34 cm and 2.12 cm, respectively. Importantly, iTracker runs in real time (10-15 fps) on modern mobile devices, making it highly practical for everyday applications.
Robustness and Generalization: iTracker's performance remains robust across different users, demonstrating its capacity to generalize well beyond the training data.

Implications and Future Directions

Practically, this work paves the way for the widespread adoption of eye tracking in consumer-grade devices. Potential applications span numerous domains including accessibility technologies, enhanced human-computer interaction, and even evolving areas like augmented reality and virtual reality interfaces. Theoretical implications suggest that the adoption of large-scale, diverse datasets significantly enhances the performance and generalizability of deep learning models in gaze estimation. This could influence the methodological approach in related fields, promoting more extensive use of crowdsourcing for data collection.

Future Developments

This research also opens avenues for several future studies:

Enhanced Calibration Techniques: Exploring additional methods to reduce calibration requirements without sacrificing accuracy could advance user experience.
Cross-Device Generalization: Investigating how well models trained on certain hardware adapt to other platforms.
Integration with Other Modalities: Combining gaze data with other sensor data (e.g., motion sensors) could yield richer contextual understanding and higher prediction accuracy.

In conclusion, the paper by Krafka et al. represents a significant advance in democratizing eye tracking technology. By effectively leveraging deep learning and large-scale data, it demonstrates the feasibility of real-time, accurate eye tracking on widely available devices, setting a new benchmark for future research and applications in this area.

PDF Markdown