- The paper introduces a novel modular framework that separates glint detection from constellation-based matching, enhancing interpretability in eye tracking.
- It employs geometric alignment techniques inspired by star tracking to robustly assign multi-LED glints with high identity-preserving accuracy and low localization error.
- Empirical evaluations on multiple datasets demonstrate consistent performance across varied hardware and pave the way for hybrid, learning-based enhancements.
Night Eyes: A Reproducible 2D Constellation-Based Framework for Corneal Reflection Matching
Motivation and Context
Achieving robust, generalizable, and reproducible corneal reflection (glint) detection remains a core challenge in P-CR (pupil–corneal reflection) eye tracking. Existing approaches primarily rely on heuristic-driven image processing pipelines or data-dependent deep networks. Heuristic systems are generally tailored to specific hardware and illumination configurations, with detection and glint–LED correspondence logic embedded in ad-hoc postprocessing, inhibiting standardization, comparison, and extension. Deep learning-based methods coalesce detection and labeling, frequently requiring significant dataset annotation and lacking transparency regarding detection versus correspondence errors. The lack of modular, openly reproducible pipelines impedes both benchmarking and methodological advances, especially for identity-preserving assignment across varied LED layouts and hardware.
Night Eyes introduces a modular, geometry-driven framework for multi-glint detection and matching, explicitly focusing on separating over-detection from identity resolution, and supporting transparent cross-dataset evaluation. The distinctive approach frames glint assignment as a geometric constellation alignment task inspired by the "lost-in-space" star tracking literature, thereby leveraging established geometric matching paradigms for robust, interpretable, and hardware-agnostic solutions.
Pipeline Overview and Methodology
The Night Eyes pipeline executes sequentially per image and is defined by intentional modularization across enhancement, detection, scoring, and template-based correspondence:
- Preprocessing and Enhancement: Images are converted to grayscale, subjected to denoising/contrast adjustments, and enhancement via white top-hat, DoG, or high-pass filtering. Candidate region-of-interest (pupil-centered) extraction is optional. The focus is on amplifying plausible glint candidates without constraining to the anticipated count.
- Candidate Extraction and Scoring: Percentile thresholding and morphological operations are used to over-detect bright spots. Geometric and photometric features are aggregated with fixed-weight heuristics and, in optional modes, geometric support voting. This results in a candidate pool without imposing any LED-specific spatial constraints at this stage.
- Adaptive Fallback and Spatial Gating: To increase recall in sparse or noisy frames, the pipeline adaptively lowers thresholds or broadens kernels to generate additional candidates, pooling and deduplicating using proximity- and score-based merging. Exclusion zones can be imposed via pupil-based annuli or image borders.
- Template Construction: Templates, reflecting target LED layouts, are median-aggregated or Procrustes-aligned across labeled datasets, normalized in scale and centroid. Selection between canonical or template-bank-based matching is available.
- Constellation Matching with Similarity–Layout Alignment (SLA): Constellation assignment is formalized as a 2D similarity alignment. Candidate sets are evaluated via triplet-based hypothesis enumeration; valid similarity transforms are discovered using pairwise geometric consistency. Hypotheses are expanded by greedy candidate–template alignment, followed by iterative residual minimization and semantic plausibility checks, optionally penalizing mirror or structurally implausible permutations. Baseline matchers (RANSAC, star-voting) are included for ablation.
- Evaluation: Metrics include identity-preserving accuracy (label-level), identity-free accuracy (detection), per-glint localization error, and assignment confusion, computed under fixed pixel thresholds.
Notable is the strict separation of candidate generation from geometric matching, promoting interpretability and internal failure mode disambiguation.
Empirical Results and Analysis
On a public five-LED labeled dataset, the Night Eyes pipeline, using a frozen configuration, achieves identity-preserving accuracy of 0.74, precision of 0.81, and median localization error of 1.41 px. The small difference between identity-preserving and identity-free metrics indicates that the principal source of error is missed detection, not assignment ambiguity. Typical failures arise from severe occlusion (especially for vertically arranged LEDs), and some erroneous matches display large outlier errors contributing to an increased mean error (10.37 px).
Cross-dataset validation is performed on OpenEDS 2019 and OpenEDS 2020. The pipeline exhibits consistent detection and correspondence patterns across setups with different illumination and camera noise distributions, requiring only dataset-specific template adaptation (no change in core logic or hyperparameters). Importantly, layout priors can be toggled to accommodate both known and unknown LED geometries.
Ablation studies confirm that geometric support voting and adaptive fallback are essential for handling missing glints, and that semantic priors effectively suppress mirrored configurations.
Theoretical and Practical Implications
Night Eyes demonstrates that combining intentional over-detection and 2D geometric constellation alignment establishes a reproducible and interpretable baseline for the P-CR correspondence problem. This design is inherently modular, supports augmentation with learning-based scoring, and is portable across heterogeneous hardware without need for retraining or architecture redesign.
Releasing code, preset configurations, annotation tools, and curated correspondence labels for major public datasets furthers the reproducibility agenda, facilitating rigorous benchmarking and enabling the community to focus on improvements at the component level rather than reimplementation.
While the 2D framework demonstrates robustness, it does not model off-axis projection or severe warping, which can arise in extreme gaze or unconventional optics. Deep models may outperform especially in very low-contrast conditions or under occlusion, but the loss in interpretability and cross-hardware portability remains a trade-off. Future directions include hybrid modular architectures integrating deep appearance models within an interpretable geometric matching interface, and generalization to 3D settings or to non-rigidly deformed constellations.
Availability, Extensibility, and Reproducibility
All major pipeline components, annotated templates, and evaluation scripts are released under liberal open-source terms, with user-friendly UI for annotation and batch evaluation. This enables rapid protocol adaptation and dataset extension by other research groups. Annotation metadata adheres to FAIR standards, with dataset licensing clarified to comply with parent sources.
Conclusion
Night Eyes provides a transparent, reproducible, and modular framework for multi-glint detection and identity-preserving assignment in P-CR eye tracking. By framing glint correspondence as a constrained 2D constellation alignment task, this framework disambiguates the classic detection–assignment entanglement and supports thorough analysis of pipeline variants. Results confirm strong accuracy and generalization across multi-LED configurations, with extensibility toward alternative matching strategies and hybrid architectures. The open release of code, annotation tools, and labeled datasets is positioned to facilitate rigorous future comparison and evolution of both classical and learning-based approaches to glint-based gaze tracking.
Reference: "Night Eyes: A Reproducible Framework for Constellation-Based Corneal Reflection Matching" (2604.01909)