GazeTrack Dataset

Updated 2 December 2025

GazeTrack is a collection of public eye-tracking datasets featuring high-precision iris localization, pupil segmentation, and gaze vector regression benchmarks.
It includes two distinct paradigms: a lightweight 416×416 iris dataset for compact CNN training and a lab benchmark with sub-degree gaze registration using advanced calibration protocols.
The datasets support real-time applications, VR/AR, and cross-domain gaze estimation research through rigorous annotation, normalization protocols, and comprehensive evaluation metrics.

GazeTrack refers to a set of public datasets and benchmarks for eye and gaze-tracking, primarily aimed at enabling high-precision pupil localization, iris boundary estimation, and gaze vector regression in both constrained and unconstrained imaging scenarios. Two prominent releases under the GazeTrack name illustrate distinct paradigms: a lightweight, annotation-focused iris dataset suitable for training compact convolutional models (Ildar, 2021), and a high-resolution, multi-modal laboratory benchmark with precise sub-degree gaze registration and advanced spatial normalization (Yang, 27 Nov 2025). Both seek to address the lack of specialized, high-quality data for gaze estimation, particularly for training and evaluating lightweight CNNs or spatial computing gaze interfaces.

1. Dataset Structure and Data Acquisition

1.1. GazeTrack (416×416, Iris-Centric, 2021)

This variant consists of 10,000 annotated eye-region images, each 416×416 pixels, derived from the "4quant/eye-gaze" face-image corpus (Ildar, 2021). Landmark-based preprocessing using dlib’s 68-point detector isolates the peri-ocular region (landmarks 36–41), followed by isotropic rescaling to the fixed field of view. Iris boundaries are segmented by a thresholding routine, targeting an iris mask occupancy of ~14% of the crop, and fit with a single circle per image:

For each image: JPEG (416×416×3), annotated as (image_filename, $x_c$ , $y_c$ , $r$ ) in pixels.
Annotation file (CSV) listing iris center coordinates and radius.
Subjects are imaged in indoor scenes with only ambient daylight or office lighting, using consumer-grade webcams—no IR light sources or restraint hardware.

1.2. GazeTrack (Lab Benchmark, 2025)

This release features a high-precision, subject-diverse dataset for 3D gaze and pupil ellipse estimation (Yang, 27 Nov 2025):

47 participants (balanced gender, broad age, visual conditions) recorded in free-head scenarios with loose chin-rest.
Monocular IR camera (DG3 system, 384×288 px @ 60 fps), IR-LED illumination (850 nm), and five-point laser calibration.
For each subject: PNG images, binary pupil masks, ellipse fit parameters ( $A,B,C,D,E,F$ ) per frame representing $Ax^2+ Bxy + Cy^2+Dx+Ey+F=0$ , and ground-truth 3D gaze vectors (device and world coordinates).
Spans the full $\pm30^\circ$ gaze FOV in pitch/yaw.

Each subject’s data are organized in per-session folders with raw frames, segmentation masks, ellipse parameter CSVs, and device/world gaze trajectories. Official splits: 32 train, 8 validation, 7 test.

2. Annotation Schema and Coordinate Conventions

Each gaze annotation consists of:

Pupil center $(x_c, y_c)$ in image coordinates, radius $r$ (pixels), origin top-left.
Normalized coordinates: $\tilde{x} = x_c / 416$ , $\tilde{y} = y_c / 416$ , $\tilde{r} = r / 416$ .
Boundary implicit equation: $(x - x_c)^2 + (y - y_c)^2 = r^2$ .
YOLO-style vector: $(\text{class\_id}, \tilde{x}, \tilde{y}, \tilde{r}, \tilde{r})$ , typically with class_id = 0.
Practical for both bounding-regression and segmentation loss frameworks.

Each frame is encoded as a six-tuple $[A,B,C,D,E,F]$ for the ellipse fit.
Pupil semantic masks (same spatial size as raw image) for pixelwise supervision.
3D gaze vectors: $\mathbf{G}_\text{dev} = (g_x, g_y, g_z)$ (device coordinates) and angles $(\theta_\text{yaw}, \theta_\text{pitch})$ , calibrated with $\pm$ 0.3° accuracy.

3. Preprocessing Pipeline and Data Normalization

3.1. Standard Procedures

Input normalization to $[0,1]$ or mean-subtract/ $\sigma$ -divide intensities.
Retain $416 \times 416$ crop; apply identical geometric transforms (rotation, scaling, translation, brightness/contrast jitter) to both image and annotations.
Recommended augmentations: random rotation in [ $-15^\circ$ , $+15^\circ$ ]; scale (0.9, 1.1); translation (±10 px); brightness/contrast modification (±20%).

3.2. Advanced Normalization (“Paper Unfolding”)

In the 2025 benchmark (Yang, 27 Nov 2025), raw target coordinates are mapped into a canonical, subject-independent grid by a multi-point, region-wise transformation. The plane subdivides into eight regions; each sample $(x, y)$ is mapped to $(x', y')$ via linear interpolation anchored at per-subject calibration points, aligning all gaze target constellations:

Ensures all trials share standard spatial reference, facilitating cross-user generalization.

4. Model Training, Losses, and Evaluation Protocols

4.1. Lightweight Networks (Iris-Only)

Using the 416×416 dataset, compact CNNs can be trained on the single-class, iris-localization task:

Suitable for real-time applications or low-overhead inference.
Supports both regression loss (MSE on $(x_c, y_c, r)$ ) and YOLO-style detection heads.
PyTorch example code snippet demonstrates data ingestion and overlay visualization (Ildar, 2021).

4.2. Segmentation and Ellipse Regularization (Lab Benchmark)

The segmentation backbone (“U-ResAtt”) is a U-Net-style encoder/decoder with residual blocks, spatial self-attention, and a mask channel (Yang, 27 Nov 2025). Total loss:

$L = \alpha L_{\text{BCE}} + \beta L_{\text{EFE}}$

where $L_{\text{BCE}}$ is binary cross-entropy on the mask and $L_{\text{EFE}}$ measures edge error between predicted and ground-truth ellipse boundaries. Optimized via Adam (lr=10^{-4}), early stopping on validation IoU plus EFE.

Gaze Vector Regression (GVnet)

Input: Sliding window of ellipse parameters and transformed coordinates.
Architecture: Self-attention layer, dual fully-connected layers (LeakyReLU, $\alpha=0.2$ ), dropout, output head projecting to L2-normalized 3D gaze.
Loss: MSE on gaze vectors; evaluation by angular error $\Delta\theta = \arccos(\mathbf{G}_{\text{pred}}\cdot\mathbf{G}_{\text{true}})$ .

4.3. Benchmark Protocols

Segmentation: 5-px pixel error threshold on public testbeds (e.g., ExCuSe).
Gaze regression: Report mean angular error (degrees), ablation of coordinate transformation methods.
Training on GazeTrack plus external corpora (e.g., LPW, ETH-XGaze), official splits for cross-participant generalization.

5. Distribution, Access, and Licensing

Dataset Version	Image Count / Files	Format(s)	Access	License
416×416 GazeTrack	10,000 images + CSV	JPEG + CSV (center, radius)	Kaggle, GitHub (code/scripts)	Citable; no formal license
Lab Benchmark GazeTrack	~12 GB (47 subjects)	PNG, binary masks, CSV (ellipse, gaze)	GitHub upon publication	CC BY-NC-SA 4.0

The 416×416 GazeTrack dataset is suitable for lightweight CNN pre-training and rapid prototyping, serving as a critical resource for specialized eye-tracker model development outside the parameter-heavy regimes of YOLO or SSD (Ildar, 2021). The lab-grade benchmark supports advanced research on precise gaze regression, spatial normalization, and segmentation-based pipelines (Yang, 27 Nov 2025).

6. Comparative Context in Gaze Datasets Landscape

GazeTrack supplements both small-scale, iris-only benchmarks and more extensive, multimodal corpora (e.g. MoGaze (Kratzer et al., 2020), which emphasizes full-body kinematics plus synchronized eye-gaze in manipulation tasks with robotic instrumentation). Whereas MoGaze integrates gaze rays with scene geometry for intent recognition and motion planning, GazeTrack concentrates on pixel-level ocular annotation and explicit, high-precision vector regression applicable to spatial computing, VR/AR, and foundation model pre-training.

A plausible implication is that GazeTrack’s focus on well-controlled, densely annotated gaze data—especially its canonicalization protocol and regularized ellipse labeling—positions it as the de facto standard for evaluating new convolutional and attention-based gaze estimation architectures, especially those operating under hardware resource constraints, and for validating cross-domain transfer in open-world gaze tasks.

7. Impact and Applications

GazeTrack enables:

Training/evaluation of lightweight high-precision CNN-based eye trackers.
Benchmarking advanced pipelines for gaze regression and pupil segmentation.
Bootstrapping personalized models for user-facing applications in AR/VR, where gaze accuracy requirements are stringent.
Studying the effects of normalization and regularization protocols on generalization across subjects and lighting conditions.
Comparative analysis with multimodal datasets (e.g., MoGaze) for hierarchical vision-to-action models in robotics.

GazeTrack fills a gap in the resource spectrum between minimal, annotation-light datasets and motion-capture-driven, multi-sensor frameworks, providing a flexible but rigorous foundation for both academic exploration and applied system prototyping (Ildar, 2021, Yang, 27 Nov 2025).

PDF Markdown Chat (Pro)

References (3)

Dataset for eye-tracking tasks (2021)

GazeTrack: High-Precision Eye Tracking Based on Regularization and Spatial Computing (2025)

MoGaze: A Dataset of Full-Body Motions that Includes Workspace Geometry and Eye-Gaze (2020)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to GazeTrack Dataset.

GazeTrack Dataset

1. Dataset Structure and Data Acquisition

1.1. GazeTrack (416×416, Iris-Centric, 2021)

1.2. GazeTrack (Lab Benchmark, 2025)

2. Annotation Schema and Coordinate Conventions

2.1. Iris Circle Format (Ildar, 2021)

2.2. Elliptical and Gaze Vectors (Yang, 27 Nov 2025)

3. Preprocessing Pipeline and Data Normalization

3.1. Standard Procedures

3.2. Advanced Normalization (“Paper Unfolding”)

4. Model Training, Losses, and Evaluation Protocols

4.1. Lightweight Networks (Iris-Only)

4.2. Segmentation and Ellipse Regularization (Lab Benchmark)

Gaze Vector Regression (GVnet)

4.3. Benchmark Protocols

5. Distribution, Access, and Licensing

6. Comparative Context in Gaze Datasets Landscape

7. Impact and Applications

Whiteboard

Follow Topic

Continue Learning

GazeTrack Dataset

1. Dataset Structure and Data Acquisition

1.1. GazeTrack (416×416, Iris-Centric, 2021)

1.2. GazeTrack (Lab Benchmark, 2025)

2. Annotation Schema and Coordinate Conventions

2.1. Iris Circle Format (Ildar, 2021)

2.2. Elliptical and Gaze Vectors (Yang, 27 Nov 2025)

3. Preprocessing Pipeline and Data Normalization

3.1. Standard Procedures

3.2. Advanced Normalization (“Paper Unfolding”)

4. Model Training, Losses, and Evaluation Protocols

4.1. Lightweight Networks (Iris-Only)

4.2. Segmentation and Ellipse Regularization (Lab Benchmark)

Gaze Vector Regression (GVnet)

4.3. Benchmark Protocols

5. Distribution, Access, and Licensing

6. Comparative Context in Gaze Datasets Landscape

7. Impact and Applications

Sponsor

Whiteboard

Follow Topic

Continue Learning

Related Topics