GazeTrack Dataset
- GazeTrack is a collection of public eye-tracking datasets featuring high-precision iris localization, pupil segmentation, and gaze vector regression benchmarks.
- It includes two distinct paradigms: a lightweight 416×416 iris dataset for compact CNN training and a lab benchmark with sub-degree gaze registration using advanced calibration protocols.
- The datasets support real-time applications, VR/AR, and cross-domain gaze estimation research through rigorous annotation, normalization protocols, and comprehensive evaluation metrics.
GazeTrack refers to a set of public datasets and benchmarks for eye and gaze-tracking, primarily aimed at enabling high-precision pupil localization, iris boundary estimation, and gaze vector regression in both constrained and unconstrained imaging scenarios. Two prominent releases under the GazeTrack name illustrate distinct paradigms: a lightweight, annotation-focused iris dataset suitable for training compact convolutional models (Ildar, 2021), and a high-resolution, multi-modal laboratory benchmark with precise sub-degree gaze registration and advanced spatial normalization (Yang, 27 Nov 2025). Both seek to address the lack of specialized, high-quality data for gaze estimation, particularly for training and evaluating lightweight CNNs or spatial computing gaze interfaces.
1. Dataset Structure and Data Acquisition
1.1. GazeTrack (416×416, Iris-Centric, 2021)
This variant consists of 10,000 annotated eye-region images, each 416×416 pixels, derived from the "4quant/eye-gaze" face-image corpus (Ildar, 2021). Landmark-based preprocessing using dlib’s 68-point detector isolates the peri-ocular region (landmarks 36–41), followed by isotropic rescaling to the fixed field of view. Iris boundaries are segmented by a thresholding routine, targeting an iris mask occupancy of ~14% of the crop, and fit with a single circle per image:
- For each image: JPEG (416×416×3), annotated as (image_filename, , , ) in pixels.
- Annotation file (CSV) listing iris center coordinates and radius.
- Subjects are imaged in indoor scenes with only ambient daylight or office lighting, using consumer-grade webcams—no IR light sources or restraint hardware.
1.2. GazeTrack (Lab Benchmark, 2025)
This release features a high-precision, subject-diverse dataset for 3D gaze and pupil ellipse estimation (Yang, 27 Nov 2025):
- 47 participants (balanced gender, broad age, visual conditions) recorded in free-head scenarios with loose chin-rest.
- Monocular IR camera (DG3 system, 384×288 px @ 60 fps), IR-LED illumination (850 nm), and five-point laser calibration.
- For each subject: PNG images, binary pupil masks, ellipse fit parameters () per frame representing , and ground-truth 3D gaze vectors (device and world coordinates).
- Spans the full gaze FOV in pitch/yaw.
Each subject’s data are organized in per-session folders with raw frames, segmentation masks, ellipse parameter CSVs, and device/world gaze trajectories. Official splits: 32 train, 8 validation, 7 test.
2. Annotation Schema and Coordinate Conventions
2.1. Iris Circle Format (Ildar, 2021)
Each gaze annotation consists of:
- Pupil center in image coordinates, radius (pixels), origin top-left.
- Normalized coordinates: , , .
- Boundary implicit equation: .
- YOLO-style vector: , typically with class_id = 0.
- Practical for both bounding-regression and segmentation loss frameworks.
2.2. Elliptical and Gaze Vectors (Yang, 27 Nov 2025)
- Each frame is encoded as a six-tuple for the ellipse fit.
- Pupil semantic masks (same spatial size as raw image) for pixelwise supervision.
- 3D gaze vectors: (device coordinates) and angles , calibrated with 0.3° accuracy.
3. Preprocessing Pipeline and Data Normalization
3.1. Standard Procedures
- Input normalization to or mean-subtract/-divide intensities.
- Retain crop; apply identical geometric transforms (rotation, scaling, translation, brightness/contrast jitter) to both image and annotations.
- Recommended augmentations: random rotation in [, ]; scale (0.9, 1.1); translation (±10 px); brightness/contrast modification (±20%).
3.2. Advanced Normalization (“Paper Unfolding”)
In the 2025 benchmark (Yang, 27 Nov 2025), raw target coordinates are mapped into a canonical, subject-independent grid by a multi-point, region-wise transformation. The plane subdivides into eight regions; each sample is mapped to via linear interpolation anchored at per-subject calibration points, aligning all gaze target constellations:
- Ensures all trials share standard spatial reference, facilitating cross-user generalization.
4. Model Training, Losses, and Evaluation Protocols
4.1. Lightweight Networks (Iris-Only)
Using the 416×416 dataset, compact CNNs can be trained on the single-class, iris-localization task:
- Suitable for real-time applications or low-overhead inference.
- Supports both regression loss (MSE on ) and YOLO-style detection heads.
- PyTorch example code snippet demonstrates data ingestion and overlay visualization (Ildar, 2021).
4.2. Segmentation and Ellipse Regularization (Lab Benchmark)
The segmentation backbone (“U-ResAtt”) is a U-Net-style encoder/decoder with residual blocks, spatial self-attention, and a mask channel (Yang, 27 Nov 2025). Total loss:
where is binary cross-entropy on the mask and measures edge error between predicted and ground-truth ellipse boundaries. Optimized via Adam (lr=10{-4}), early stopping on validation IoU plus EFE.
Gaze Vector Regression (GVnet)
- Input: Sliding window of ellipse parameters and transformed coordinates.
- Architecture: Self-attention layer, dual fully-connected layers (LeakyReLU, ), dropout, output head projecting to L2-normalized 3D gaze.
- Loss: MSE on gaze vectors; evaluation by angular error .
4.3. Benchmark Protocols
- Segmentation: 5-px pixel error threshold on public testbeds (e.g., ExCuSe).
- Gaze regression: Report mean angular error (degrees), ablation of coordinate transformation methods.
- Training on GazeTrack plus external corpora (e.g., LPW, ETH-XGaze), official splits for cross-participant generalization.
5. Distribution, Access, and Licensing
| Dataset Version | Image Count / Files | Format(s) | Access | License |
|---|---|---|---|---|
| 416×416 GazeTrack | 10,000 images + CSV | JPEG + CSV (center, radius) | Kaggle, GitHub (code/scripts) | Citable; no formal license |
| Lab Benchmark GazeTrack | ~12 GB (47 subjects) | PNG, binary masks, CSV (ellipse, gaze) | GitHub upon publication | CC BY-NC-SA 4.0 |
The 416×416 GazeTrack dataset is suitable for lightweight CNN pre-training and rapid prototyping, serving as a critical resource for specialized eye-tracker model development outside the parameter-heavy regimes of YOLO or SSD (Ildar, 2021). The lab-grade benchmark supports advanced research on precise gaze regression, spatial normalization, and segmentation-based pipelines (Yang, 27 Nov 2025).
6. Comparative Context in Gaze Datasets Landscape
GazeTrack supplements both small-scale, iris-only benchmarks and more extensive, multimodal corpora (e.g. MoGaze (Kratzer et al., 2020), which emphasizes full-body kinematics plus synchronized eye-gaze in manipulation tasks with robotic instrumentation). Whereas MoGaze integrates gaze rays with scene geometry for intent recognition and motion planning, GazeTrack concentrates on pixel-level ocular annotation and explicit, high-precision vector regression applicable to spatial computing, VR/AR, and foundation model pre-training.
A plausible implication is that GazeTrack’s focus on well-controlled, densely annotated gaze data—especially its canonicalization protocol and regularized ellipse labeling—positions it as the de facto standard for evaluating new convolutional and attention-based gaze estimation architectures, especially those operating under hardware resource constraints, and for validating cross-domain transfer in open-world gaze tasks.
7. Impact and Applications
GazeTrack enables:
- Training/evaluation of lightweight high-precision CNN-based eye trackers.
- Benchmarking advanced pipelines for gaze regression and pupil segmentation.
- Bootstrapping personalized models for user-facing applications in AR/VR, where gaze accuracy requirements are stringent.
- Studying the effects of normalization and regularization protocols on generalization across subjects and lighting conditions.
- Comparative analysis with multimodal datasets (e.g., MoGaze) for hierarchical vision-to-action models in robotics.
GazeTrack fills a gap in the resource spectrum between minimal, annotation-light datasets and motion-capture-driven, multi-sensor frameworks, providing a flexible but rigorous foundation for both academic exploration and applied system prototyping (Ildar, 2021, Yang, 27 Nov 2025).