GazeSAM: What You See is What You Segment (2304.13844v1)

Published 26 Apr 2023 in cs.CV

Abstract: This study investigates the potential of eye-tracking technology and the Segment Anything Model (SAM) to design a collaborative human-computer interaction system that automates medical image segmentation. We present the \textbf{GazeSAM} system to enable radiologists to collect segmentation masks by simply looking at the region of interest during image diagnosis. The proposed system tracks radiologists' eye movement and utilizes the eye-gaze data as the input prompt for SAM, which automatically generates the segmentation mask in real time. This study is the first work to leverage the power of eye-tracking technology and SAM to enhance the efficiency of daily clinical practice. Moreover, eye-gaze data coupled with image and corresponding segmentation labels can be easily recorded for further advanced eye-tracking research. The code is available in \url{https://github.com/ukaukaaaa/GazeSAM}.

Authors (4)

Bin Wang (750 papers)
Armstrong Aboah (29 papers)
Zheyuan Zhang (61 papers)
Ulas Bagci (154 papers)

Citations (17)

View on Semantic Scholar

Summary

The paper presents a novel system that integrates eye-tracking with the Segment Anything Model for automated real-time medical image segmentation.
It employs a screen-based eye tracker to capture radiologists' gaze patterns, guiding segmentation for both 2D and 3D medical images.
The study highlights potential improvements in segmentation quality through domain-specific pre-training and reduced clinician cognitive load.

GazeSAM: Integrating Eye-Tracking with the Segment Anything Model for Enhanced Medical Image Segmentation

The paper "GazeSAM: What You See is What You Segment" by Wang et al. introduces the GazeSAM system, a novel approach that integrates eye-tracking technology with the Segment Anything Model (SAM) to facilitate real-time medical image segmentation. This research aims to address the inefficiencies associated with manual annotation processes commonly used in medical image segmentation, proposing an innovative system that leverages the natural gaze patterns of radiologists to prompt automated segmentation.

The GazeSAM system utilizes a screen-based eye tracker to capture the eye movements of radiologists as they examine medical images. These eye movements identify regions of interest, which are then used as input prompts for SAM to generate segmentation masks autonomously. This approach offers several advantages, including reduced cognitive load for radiologists and the potential to standardize the collection of eye-tracking data alongside segmentation masks for further research.

The system's capabilities are tailored to enhance radiologist efficiency through a collaborative human-computer interaction framework. Notably, GazeSAM supports both 2D and 3D image modalities, which are integral to medical imaging workflows. This dual functionality ensures comprehensive applicability across different clinical scenarios.

The experimental setup includes a user interface designed for seamless interaction with the eye tracker. The framework allows real-time visualization of gaze points, facilitating an interactive segmentation process. This interaction scheme is particularly beneficial in iterative refinement scenarios where the gaze data is used to amend initial segmentation masks, yielding more precise results.

A crucial consideration discussed in the paper is the inherent limitation of SAM, which traditionally performs optimally on natural images and may not directly transfer its performance to medical images without adaptation. The paper notes that while GazeSAM introduces efficiency improvements via intuitive interaction and gaze-based input prompts, segmentation quality could be enhanced by refining SAM with domain-specific datasets. Such refinement would likely necessitate pre-training on extensive medical imaging data to better match the variability and complexity of medical images.

In terms of practical impact, GazeSAM holds promise for significantly accelerating the workflow of medical imaging professionals. By reducing the time spent on manual segmentation tasks, it allows radiologists to focus on higher-level diagnostic processes and decision-making. The system also opens up avenues for advanced research in eye-tracking applications, potentially fueling future advancements in both segmentation accuracy and input mechanisms for machine learning models in medical contexts.

Theoretically, GazeSAM illustrates the potential for integrating user-centric data collection techniques, such as eye-tracking, with powerful machine learning models. The fusion of these technologies suggests a pathway for the development of AI-driven solutions that adapt to and enhance human cognitive processes. Future developments could explore the integration of additional biometric data to further enhance interaction paradigms and segmentation accuracy.

Overall, the research presented in GazeSAM provides a compelling framework for augmenting human capabilities in medical imaging workflows, positioning it as a significant step toward the practical application of interactive AI systems in healthcare. The insights gained from this paper may inform further exploration and refinement of gaze-based input methodologies in other high-stakes domains where precision and efficiency are paramount.

PDF Markdown

Related Papers

GitHub

GitHub - ukaukaaaa/GazeSAM: A human-computer interaction system that combines eye tracking with Segment Anything Model (SAM), and it enables users to segment object they are looking at in real-time. [NeurIPSW 2023] (62 stars)

YouTube

Show All Videos