Event-Based Eye Tracking. 2025 Event-based Vision Workshop (2504.18249v1)

Published 25 Apr 2025 in cs.CV, cs.AI, and cs.LG

Abstract: This survey serves as a review for the 2025 Event-Based Eye Tracking Challenge organized as part of the 2025 CVPR event-based vision workshop. This challenge focuses on the task of predicting the pupil center by processing event camera recorded eye movement. We review and summarize the innovative methods from teams rank the top in the challenge to advance future event-based eye tracking research. In each method, accuracy, model size, and number of operations are reported. In this survey, we also discuss event-based eye tracking from the perspective of hardware design.

Summary

The paper reviews innovative methodologies from top teams in the 2025 challenge, highlighting advancements in temporal and spatial modeling, data augmentation, and post-processing for improved event-based eye tracking accuracy.
Key approaches include sophisticated neural network architectures like BRAT and TDTracker, designed for capturing temporal dynamics and achieving computational efficiency.
These advancements have significant practical implications for low-power, high-frequency eye tracking in AR/VR systems and potential applications in healthcare.

Event-Based Eye Tracking: Insights from the 2025 Event-Based Vision Workshop

The paper "Event-Based Eye Tracking" provides a comprehensive overview of research methodologies and findings from the 2025 Event-Based Eye Tracking Challenge held during the CVPR event-based vision workshop. This challenge aimed at predicting pupil center positions by processing data from event cameras used in eye movement tracking. The contribution of this paper lies in its examination of innovative solutions employed by top-ranking teams, offering insight into both hardware and algorithmic advancements.

Challenge Overview

Event-based eye tracking focuses on leveraging Dynamic Vision Sensors (DVS), which employ asynchronous mechanisms to detect changes in brightness, thus offering high temporal resolution and spatiotemporal sparsity. The challenge introduced participants to the 3ET+ dataset, recorded at 100 Hz using DVXplorer Mini cameras, featuring a range of eye movement tasks executed by 13 subjects. Teams were evaluated based on pixel error, contrasting the prior year's emphasis on p-accuracy, intending to establish a clearer distinction among high-performance models.

Methodological Insights

The paper describes several standout approaches from the challenge, showcasing innovations in model architecture, data preprocessing, and post-processing strategies to enhance tracking accuracy:

Temporal and Spatial Modeling: Teams such as USTCEventGroup implemented the BRAT network, which combines CNNs for spatial feature extraction with bidirectional GRU and attention mechanisms for modeling short- and long-term temporal dependencies. The bidirectional relative positional attention introduced provides nuanced handling of temporal dynamics, optimizing predictions.
Data Augmentation Techniques: CherryChums's strategy involved augmentations like temporal shifts, spatial flips, and random event deletion to bolster model robustness against varied real-world perturbations such as motion jitter or sensor dropout.
Post-processing Improvements: EyeTracking@SMU proposed motion-aware median filtering and optical flow-based refinement as post-processing techniques, showcasing substantial enhancements to the continuity and spatial accuracy of predictions, addressing blinking artifacts and temporal inconsistency within gaze trajectories.
Architectural Efficiency: HKUSTGZ's TDTracker employs a modular design combining 3D CNNs and Graded Recurrent Units (GRUs), optimizing both performance and computational efficiency, effectively accommodating real-time processing demands with minimal overhead.

Technical and Practical Implications

The methodological advancements indicated in the paper underline practical implications for eye tracking technologies, especially in mobile and wearable devices. With AR/VR systems requiring high-frequency sampling while constrained by low-power settings, event-based solutions offer noteworthy benefits due to the inherent efficiency and reduced processing needs afforded by sparse event generation. Furthermore, the methodologies exhibit potential across healthcare applications, providing nuanced tools for non-invasive monitoring of neurological conditions.

Future Directions

The paper posits further exploration of metrics beyond pixel error to gauge computational efficiency, such as memory footprint and sparsity levels within neural networks. These considerations will be pivotal in guiding future hardware designs conducive to edge computing in wearable devices. Continued innovation in algorithmic strategies, coupled with advancements in event-driven hardware architectures, will be critical to realizing more versatile and scalable solutions in eye tracking across varied applications.

In conclusion, the paper encapsulates a series of progressive techniques and discussions that elucidate the trajectory of event-based eye tracking research. By scrutinizing the innovation evident in the challenge submissions, the research community gains invaluable perspectives toward optimizing real-time, efficient, and high-resolution gaze tracking systems. As this field evolves, the integration of configurable and adaptable systems is likely to play an increasingly central role in accommodating diverse operational requirements and advancing application-specific utilities.