The Visual Experience Dataset: Over 200 Recorded Hours of Integrated Eye Movement, Odometry, and Egocentric Video

Published 15 Feb 2024 in cs.CV and cs.HC | (2404.18934v2)

Abstract: We introduce the Visual Experience Dataset (VEDB), a compilation of over 240 hours of egocentric video combined with gaze- and head-tracking data that offers an unprecedented view of the visual world as experienced by human observers. The dataset consists of 717 sessions, recorded by 58 observers ranging from 6-49 years old. This paper outlines the data collection, processing, and labeling protocols undertaken to ensure a representative sample and discusses the potential sources of error or bias within the dataset. The VEDB's potential applications are vast, including improving gaze tracking methodologies, assessing spatiotemporal image statistics, and refining deep neural networks for scene and activity recognition. The VEDB is accessible through established open science platforms and is intended to be a living dataset with plans for expansion and community contributions. It is released with an emphasis on ethical considerations, such as participant privacy and the mitigation of potential biases. By providing a dataset grounded in real-world experiences and accompanied by extensive metadata and supporting code, the authors invite the research community to utilize and contribute to the VEDB, facilitating a richer understanding of visual perception and behavior in naturalistic settings.

Abstract PDF Upgrade to Chat

Citations (1)

View on Semantic Scholar

Summary

The paper presents a comprehensive dataset integrating egocentric video, precise gaze, and head-tracking data for robust vision analysis.
It details rigorous methodologies using equipment like Pupil-Labs and Intel RealSense T265 to capture data across diverse indoor and outdoor contexts.
The dataset enables novel insights into head-eye coordination and context classification, advancing research in visual perception and AI applications.

Visual Experience Dataset: A Comprehensive Resource for Vision Research

The paper introduces the Visual Experience Dataset (VEDB), a significant addition to the resources available for research in visual perception and related fields. This dataset comprises over 240 hours of egocentric video, augmented with gaze and head-tracking data, providing a detailed view of visual experiences as perceived by humans. The dataset spans 717 sessions recorded by 58 participants from different age groups, ensuring a diverse range of conditions and contexts.

Dataset Composition and Collection

The VEDB is characterized by its detailed data streams, including egocentric video, gaze-tracking, and head-tracking information. The dataset includes recordings from both indoor and outdoor settings, capturing a variety of tasks, from sedentary activities to dynamic outdoor scenarios. The consistent methodology across multiple environments and activities presents a rich opportunity for assessing visual perception and related behaviors in naturalistic settings.

The dataset was collected using advanced equipment and rigorous protocols. For instance, eye-tracking employed the Pupil-Labs Core system, providing precise gaze data supplemented by a high-resolution world camera. Furthermore, head movements were tracked with the Intel RealSense T265, offering comprehensive odometry.

Methodological Considerations

The authors address several essential methodological considerations to enhance the dataset's utility. They provide meticulous descriptions of hardware configurations, including custom modifications to improve tracking accuracy and participant comfort. Preprocessing and calibration procedures are outlined in detail, ensuring high-quality data representation across sessions.

Significant attention is given to potential sources of error and bias, such as recording omissions and calibration challenges. The paper discusses the substantial variability in gaze calibration success, with specific errors attributed to lighting conditions and participant movements. Strategies to mitigate privacy risks are also elaborated, including blurring sensitive video elements.

Implications and Use Cases

VEDB opens avenues for investigating spatiotemporal statistics in visual perception, which abundant literature has previously explored predominantly via static images. The dataset's integration of gaze and head-tracking data allows for innovative analyses of head-eye coordination and their effects on attention orientation. Such insights are critical for refining models of sensory processing and perception.

On a practical level, the dataset's annotated scenes and tasks serve as a valuable resource for improving deep neural networks’ accuracy in context classification. The dataset can enhance training protocols, potentially reducing biases in scene and activity recognition systems, offering a robust platform for developments in human-computer interaction and robotics.

Future Directions

The VEDB is structured as a living dataset, with ongoing plans for expansion and community contributions. The authors encourage collaborative growth to maximize the dataset's applicability across various domains. Future research might leverage this rich resource to further explore the dynamics of visual perception in an array of real-world contexts, potentially enhancing current understanding and informing innovative technologies.

The availability of VEDB through open science platforms, alongside comprehensive metadata and supporting code, underlines the authors' commitment to fostering accessibility and reproducibility in vision science research. Such initiatives are pivotal for advancing theoretical frameworks and practical applications within artificial intelligence and beyond.

Conclusion

The release of the VEDB marks an important step in the evolution of datasets available for studying visual perception. By integrating extensive metadata and ensuring meticulous data curation, this resource provides fertile ground for both current investigations and future innovations in the field. Researchers are provided with a versatile tool that not only helps in understanding naturalistic visual experiences but also in refining computational models that emulate human visual processing.

Markdown