TurkerGaze: Crowdsourcing Saliency with Webcam based Eye Tracking (1504.06755v2)

Published 25 Apr 2015 in cs.CV

Abstract: Traditional eye tracking requires specialized hardware, which means collecting gaze data from many observers is expensive, tedious and slow. Therefore, existing saliency prediction datasets are order-of-magnitudes smaller than typical datasets for other vision recognition tasks. The small size of these datasets limits the potential for training data intensive algorithms, and causes overfitting in benchmark evaluation. To address this deficiency, this paper introduces a webcam-based gaze tracking system that supports large-scale, crowdsourced eye tracking deployed on Amazon Mechanical Turk (AMTurk). By a combination of careful algorithm and gaming protocol design, our system obtains eye tracking data for saliency prediction comparable to data gathered in a traditional lab setting, with relatively lower cost and less effort on the part of the researchers. Using this tool, we build a saliency dataset for a large number of natural images. We will open-source our tool and provide a web server where researchers can upload their images to get eye tracking results from AMTurk.

Citations (342)

View on Semantic Scholar

Summary

The paper presents a novel webcam-based system that uses AMTurk to collect large-scale, high-quality eye-tracking data for saliency analysis.
The paper employs engaging game-based protocols to maintain participant focus and generate fixation data comparable to traditional lab settings.
The paper offers an open-source tool and validates its approach against state-of-the-art saliency models, demonstrating its potential for diverse vision research applications.

Overview of the TurkerGaze Paper

The paper "TurkerGaze: Crowdsourcing Saliency with Webcam-based Eye Tracking" presents an innovative approach for capturing eye tracking data to predict visual saliency in natural images, leveraging the crowdsourcing platform Amazon Mechanical Turk (AMTurk). The authors address the limitations of traditional eye tracking methodologies, which typically require expensive and specialized hardware in controlled laboratory environments. These limitations hinder the creation of large-scale datasets necessary for training robust saliency prediction models in computer vision.

Objectives and Contributions

The primary objective of this research is to develop a low-cost, scalable solution for collecting eye tracking data via webcams, thus facilitating large-scale saliency dataset creation. The salient contributions are:

Webcam-based Eye Tracking System: A system capable of gathering eye tracking data with webcam-based technology while maintaining quality comparable to that obtained in lab settings.
Data Collection via Crowdsourcing: Deployment on AMTurk allows for extensive data collection from a diverse participant pool, enabling the assembly of a substantial saliency dataset. This crowdsourcing method mitigates issues related to cost and scalability inherent in traditional approaches.
Game-based Protocol Design: The integration of engaging game mechanics motivates participants to provide high-quality gaze data. The 'Angry Birds' and 'Whac-A-Mole' inspired interfaces ensure participant engagement and focus during eye tracking tasks.
Open-source Tool and Web Server: The authors promise an open-source release of their tool alongside a web server, allowing other researchers to utilize this setup for their gaze data collection needs.

Evaluation and Results

The system's efficacy is validated through comparisons against commercial eye tracking systems, demonstrating median gaze prediction errors of approximately $1.06\,^{\circ}$, which is in line with existing webcam-based methodologies. The authors report that their meanshift clustering approach effectively extracts fixation points from noisy, subsampled gaze data. Furthermore, saliency maps generated from AMTurk data closely match those produced by inter-subject agreement in a controlled laboratory setup.

The paper presents a comprehensive comparison of the TurkerGaze performance against several state-of-the-art saliency models, noting that AMTurk-based saliency maps yield predictive accuracy competitive with top models in the field.

Implications and Speculation on Future Developments

The implications of this research are manifold for the fields of computer vision, psychology, and human-computer interaction. By enabling large-scale eye tracking data collection, this approach opens up possibilities for more data-hungry machine learning models, potentially enhancing performance in applications such as autonomous driving, adaptive interfaces, and augmented reality systems.

Future work might explore the integration of more sophisticated computer vision techniques to refine gaze prediction algorithms further. Additionally, expanding the scope to video stimuli or augmenting the platform's adaptive capabilities based on real-time gaze analysis could yield deeper insights into temporal dynamics in visual attention. The open-source nature of the tool is likely to catalyze further research and development, facilitating community-driven enhancements and applications in diverse fields.

The TurkerGaze paper lays a foundation for continued advancements in scalable human attention modeling, suggesting a promising trajectory for future research in leveraging crowdsourced platforms for complex data collection tasks in computational settings.

PDF Markdown