Papers
Topics
Authors
Recent
2000 character limit reached

GazeTrack Dataset

Updated 2 December 2025
  • GazeTrack is a collection of public eye-tracking datasets featuring high-precision iris localization, pupil segmentation, and gaze vector regression benchmarks.
  • It includes two distinct paradigms: a lightweight 416×416 iris dataset for compact CNN training and a lab benchmark with sub-degree gaze registration using advanced calibration protocols.
  • The datasets support real-time applications, VR/AR, and cross-domain gaze estimation research through rigorous annotation, normalization protocols, and comprehensive evaluation metrics.

GazeTrack refers to a set of public datasets and benchmarks for eye and gaze-tracking, primarily aimed at enabling high-precision pupil localization, iris boundary estimation, and gaze vector regression in both constrained and unconstrained imaging scenarios. Two prominent releases under the GazeTrack name illustrate distinct paradigms: a lightweight, annotation-focused iris dataset suitable for training compact convolutional models (Ildar, 2021), and a high-resolution, multi-modal laboratory benchmark with precise sub-degree gaze registration and advanced spatial normalization (Yang, 27 Nov 2025). Both seek to address the lack of specialized, high-quality data for gaze estimation, particularly for training and evaluating lightweight CNNs or spatial computing gaze interfaces.

1. Dataset Structure and Data Acquisition

1.1. GazeTrack (416×416, Iris-Centric, 2021)

This variant consists of 10,000 annotated eye-region images, each 416×416 pixels, derived from the "4quant/eye-gaze" face-image corpus (Ildar, 2021). Landmark-based preprocessing using dlib’s 68-point detector isolates the peri-ocular region (landmarks 36–41), followed by isotropic rescaling to the fixed field of view. Iris boundaries are segmented by a thresholding routine, targeting an iris mask occupancy of ~14% of the crop, and fit with a single circle per image:

  • For each image: JPEG (416×416×3), annotated as (image_filename, xcx_c, ycy_c, rr) in pixels.
  • Annotation file (CSV) listing iris center coordinates and radius.
  • Subjects are imaged in indoor scenes with only ambient daylight or office lighting, using consumer-grade webcams—no IR light sources or restraint hardware.

1.2. GazeTrack (Lab Benchmark, 2025)

This release features a high-precision, subject-diverse dataset for 3D gaze and pupil ellipse estimation (Yang, 27 Nov 2025):

  • 47 participants (balanced gender, broad age, visual conditions) recorded in free-head scenarios with loose chin-rest.
  • Monocular IR camera (DG3 system, 384×288 px @ 60 fps), IR-LED illumination (850 nm), and five-point laser calibration.
  • For each subject: PNG images, binary pupil masks, ellipse fit parameters (A,B,C,D,E,FA,B,C,D,E,F) per frame representing Ax2+Bxy+Cy2+Dx+Ey+F=0Ax^2+ Bxy + Cy^2+Dx+Ey+F=0, and ground-truth 3D gaze vectors (device and world coordinates).
  • Spans the full ±30\pm30^\circ gaze FOV in pitch/yaw.

Each subject’s data are organized in per-session folders with raw frames, segmentation masks, ellipse parameter CSVs, and device/world gaze trajectories. Official splits: 32 train, 8 validation, 7 test.

2. Annotation Schema and Coordinate Conventions

Each gaze annotation consists of:

  • Pupil center (xc,yc)(x_c, y_c) in image coordinates, radius rr (pixels), origin top-left.
  • Normalized coordinates: x~=xc/416\tilde{x} = x_c / 416, y~=yc/416\tilde{y} = y_c / 416, r~=r/416\tilde{r} = r / 416.
  • Boundary implicit equation: (xxc)2+(yyc)2=r2(x - x_c)^2 + (y - y_c)^2 = r^2.
  • YOLO-style vector: (class_id,x~,y~,r~,r~)(\text{class\_id}, \tilde{x}, \tilde{y}, \tilde{r}, \tilde{r}), typically with class_id = 0.
  • Practical for both bounding-regression and segmentation loss frameworks.
  • Each frame is encoded as a six-tuple [A,B,C,D,E,F][A,B,C,D,E,F] for the ellipse fit.
  • Pupil semantic masks (same spatial size as raw image) for pixelwise supervision.
  • 3D gaze vectors: Gdev=(gx,gy,gz)\mathbf{G}_\text{dev} = (g_x, g_y, g_z) (device coordinates) and angles (θyaw,θpitch)(\theta_\text{yaw}, \theta_\text{pitch}), calibrated with ±\pm0.3° accuracy.

3. Preprocessing Pipeline and Data Normalization

3.1. Standard Procedures

  • Input normalization to [0,1][0,1] or mean-subtract/σ\sigma-divide intensities.
  • Retain 416×416416 \times 416 crop; apply identical geometric transforms (rotation, scaling, translation, brightness/contrast jitter) to both image and annotations.
  • Recommended augmentations: random rotation in [15-15^\circ, +15+15^\circ]; scale (0.9, 1.1); translation (±10 px); brightness/contrast modification (±20%).

3.2. Advanced Normalization (“Paper Unfolding”)

In the 2025 benchmark (Yang, 27 Nov 2025), raw target coordinates are mapped into a canonical, subject-independent grid by a multi-point, region-wise transformation. The plane subdivides into eight regions; each sample (x,y)(x, y) is mapped to (x,y)(x', y') via linear interpolation anchored at per-subject calibration points, aligning all gaze target constellations:

  • Ensures all trials share standard spatial reference, facilitating cross-user generalization.

4. Model Training, Losses, and Evaluation Protocols

4.1. Lightweight Networks (Iris-Only)

Using the 416×416 dataset, compact CNNs can be trained on the single-class, iris-localization task:

  • Suitable for real-time applications or low-overhead inference.
  • Supports both regression loss (MSE on (xc,yc,r)(x_c, y_c, r)) and YOLO-style detection heads.
  • PyTorch example code snippet demonstrates data ingestion and overlay visualization (Ildar, 2021).

4.2. Segmentation and Ellipse Regularization (Lab Benchmark)

The segmentation backbone (“U-ResAtt”) is a U-Net-style encoder/decoder with residual blocks, spatial self-attention, and a mask channel (Yang, 27 Nov 2025). Total loss:

L=αLBCE+βLEFEL = \alpha L_{\text{BCE}} + \beta L_{\text{EFE}}

where LBCEL_{\text{BCE}} is binary cross-entropy on the mask and LEFEL_{\text{EFE}} measures edge error between predicted and ground-truth ellipse boundaries. Optimized via Adam (lr=10{-4}), early stopping on validation IoU plus EFE.

Gaze Vector Regression (GVnet)

  • Input: Sliding window of ellipse parameters and transformed coordinates.
  • Architecture: Self-attention layer, dual fully-connected layers (LeakyReLU, α=0.2\alpha=0.2), dropout, output head projecting to L2-normalized 3D gaze.
  • Loss: MSE on gaze vectors; evaluation by angular error Δθ=arccos(GpredGtrue)\Delta\theta = \arccos(\mathbf{G}_{\text{pred}}\cdot\mathbf{G}_{\text{true}}).

4.3. Benchmark Protocols

  • Segmentation: 5-px pixel error threshold on public testbeds (e.g., ExCuSe).
  • Gaze regression: Report mean angular error (degrees), ablation of coordinate transformation methods.
  • Training on GazeTrack plus external corpora (e.g., LPW, ETH-XGaze), official splits for cross-participant generalization.

5. Distribution, Access, and Licensing

Dataset Version Image Count / Files Format(s) Access License
416×416 GazeTrack 10,000 images + CSV JPEG + CSV (center, radius) Kaggle, GitHub (code/scripts) Citable; no formal license
Lab Benchmark GazeTrack ~12 GB (47 subjects) PNG, binary masks, CSV (ellipse, gaze) GitHub upon publication CC BY-NC-SA 4.0

The 416×416 GazeTrack dataset is suitable for lightweight CNN pre-training and rapid prototyping, serving as a critical resource for specialized eye-tracker model development outside the parameter-heavy regimes of YOLO or SSD (Ildar, 2021). The lab-grade benchmark supports advanced research on precise gaze regression, spatial normalization, and segmentation-based pipelines (Yang, 27 Nov 2025).

6. Comparative Context in Gaze Datasets Landscape

GazeTrack supplements both small-scale, iris-only benchmarks and more extensive, multimodal corpora (e.g. MoGaze (Kratzer et al., 2020), which emphasizes full-body kinematics plus synchronized eye-gaze in manipulation tasks with robotic instrumentation). Whereas MoGaze integrates gaze rays with scene geometry for intent recognition and motion planning, GazeTrack concentrates on pixel-level ocular annotation and explicit, high-precision vector regression applicable to spatial computing, VR/AR, and foundation model pre-training.

A plausible implication is that GazeTrack’s focus on well-controlled, densely annotated gaze data—especially its canonicalization protocol and regularized ellipse labeling—positions it as the de facto standard for evaluating new convolutional and attention-based gaze estimation architectures, especially those operating under hardware resource constraints, and for validating cross-domain transfer in open-world gaze tasks.

7. Impact and Applications

GazeTrack enables:

  • Training/evaluation of lightweight high-precision CNN-based eye trackers.
  • Benchmarking advanced pipelines for gaze regression and pupil segmentation.
  • Bootstrapping personalized models for user-facing applications in AR/VR, where gaze accuracy requirements are stringent.
  • Studying the effects of normalization and regularization protocols on generalization across subjects and lighting conditions.
  • Comparative analysis with multimodal datasets (e.g., MoGaze) for hierarchical vision-to-action models in robotics.

GazeTrack fills a gap in the resource spectrum between minimal, annotation-light datasets and motion-capture-driven, multi-sensor frameworks, providing a flexible but rigorous foundation for both academic exploration and applied system prototyping (Ildar, 2021, Yang, 27 Nov 2025).

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to GazeTrack Dataset.