Interactive Segmentation Techniques
- Interactive segmentation is a framework that combines user inputs like clicks, scribbles, and contours with algorithmic updates to accurately delineate objects.
- It employs diverse interaction modalities and encoding methods such as binary maps and distance transforms to iteratively enhance segmentation masks.
- Applications span medical imaging, dataset creation, and image editing, reducing annotation effort while improving precision in challenging visual contexts.
Interactive segmentation is a paradigm in computer vision that integrates human input directly into the segmentation process, enabling precise, efficient extraction of object or region masks via structured human–machine collaboration. This approach targets scenarios where fully automatic methods struggle—such as ambiguous boundaries, novel object classes, or domain shifts—and seeks to minimize user effort while maximizing segmentation quality and controllability.
1. Foundational Concepts and Principles
Interactive segmentation (IS) operates by iterating between algorithmic prediction and user input—such as clicks, scribbles, or contours—to guide and correct segmentation boundaries. Unlike fully automatic segmentation, IS explicitly models the loop wherein each user action leads to a machine update, which the user then reviews and further refines if necessary. This collaborative structure underpins both classic methods (graph cuts, level sets) and modern learning-based approaches.
Guidance signals from the user are encoded in various formats depending on the method: as binary maps (containing positive/negative clicks), distance transforms, geodesic maps, contour masks, or input channels appended to the image. The IS workflow involves the following high-level loop:
- The model predicts a segmentation mask for the input image, optionally using prior user annotations or interactions.
- The user inspects the result, then provides additional input (e.g., correcting errors by placing positive clicks inside missed objects, negative clicks to erase false positives, or drawing scribbles/contours for ambiguous regions).
- The model incorporates this input, updates the segmentation, and presents the refined mask to the user for further assessment.
2. User Interaction Modalities and Encodings
A central design aspect of IS lies in the nature and encoding of user input. Research has demonstrated that the user can guide the segmentation via multiple interaction modes:
- Clicks: Discrete points labeled as object (positive) or background (negative), typically encoded as Gaussian heatmaps or disk masks. Click-based approaches are standard due to their simplicity but may be laborious for small or complex objects.
- Scribbles: Freeform strokes providing broader cues, often more expressive and efficient for complex regions (UI-Net: Interactive Artificial Neural Networks for Iterative Image Segmentation Based on a User Model, 2017).
- Contours: Loose or tight closed curves encapsulating a region, enabling rapid selection of single or multiple objects (Contour-based Interactive Segmentation, 2023). Contour input reduces effort by matching the accuracy of many clicks with a single gesture.
- Multi-gesture and Context-Free Interfaces: Recent works support mixtures of clicks, scribbles, lassos, or rectangles without requiring the user to specify intent explicitly (Interactive Segmentation for Diverse Gesture Types Without Context, 2023).
Interaction signals are transformed into input tensors for the network. For example, click maps are appended as extra channels alongside the RGB image for deep models. In feedback control theory-inspired methods, user corrections modify the system as impulsive controls, providing formal stability guarantees (Interactive Image Segmentation From A Feedback Control Perspective, 2016).
3. Algorithmic Frameworks and Model Architectures
The development of IS algorithms spans from graph-based to deep learning methods, with recent approaches integrating transformers and continual learning. Key frameworks include:
- Graph-based and Label Propagation Methods: Early and robust IS techniques model the image as a graph (pixels, superpixels) with labels propagated from user seeds via random walks or networks with the small-world property, yielding high accuracy from sparse scribbles while remaining computationally efficient (Interactive Image Segmentation using Label Propagation through Complex Networks, 2019).
- Attributed Relational Graphs: Modeling both the image and the user-specified region of interest as attributed relational graphs, with segmentation cast as a graph matching problem that seeks to minimize structure-preserving deformation cost (A New Algorithm for Interactive Structural Image Segmentation, 2008).
- CNN/FCN-based Approaches: Fully convolutional networks integrate user input as auxiliary channels, iteratively refining segmentation as more corrections arrive (UI-Net: Interactive Artificial Neural Networks for Iterative Image Segmentation Based on a User Model, 2017). User-model-based iterative training further aligns model behavior with actual user correction patterns (Iteratively Trained Interactive Segmentation, 2018).
- Vision Transformers: Transformer backbones, due to their ability to model long-range dependencies, can encode both image and interaction information effectively. Some models transfer guidance from exemplars (already segmented objects) to speed up multi-object segmentation in the same image (Learning from Exemplars for Interactive Image Segmentation, 17 Jun 2024).
- Gaussian Process Classification: Treating IS as GP-based pixel-wise binary classification allows explicit, theoretically-guaranteed label propagation, making predictions at user clicks correct by construction and enabling efficient linear-time inference (Interactive Segmentation as Gaussian Process Classification, 2023).
- Quasi-Conformal Mapping and Topology Preservation: Ensures that interactive corrections preserve desired topology in segmentation results, crucial for medical or scientific imaging (QIS : Interactive Segmentation via Quasi-Conformal Mappings, 22 Feb 2024).
4. Robustness, Adaptation, and Efficiency
IS must remain robust to variable user input and domain shifts:
- Test-Time Adaptation (TTA): Methods such as DC-TTA partition user clicks into coherent subsets and adapt individual model replicas per subset, merging their specialized knowledge for improved handling of complex (e.g., camouflaged or multi-part) objects (DC-TTA: Divide-and-Conquer Framework for Test-Time Adaptation of Interactive Segmentation, 29 Jun 2025).
- Continual Learning: Models like RAIS maintain a stable global backbone while quickly adapting local parameters in response to user feedback and domain changes, balancing robustness with plasticity (RAIS: Robust and Accurate Interactive Segmentation via Continual Learning, 2022).
- Robustness Evaluation: Recent benchmarks (e.g., TETRIS) explicitly measure sensitivity to user click location by generating both optimal and adversarial click sequences using white-box attacks, emphasizing that high benchmark scores may not translate into real-world robustness (TETRIS: Towards Exploring the Robustness of Interactive Segmentation, 9 Feb 2024).
Efficiency advances include real-time architectures (e.g., InterFormer) that decouple heavy image encoding (done offline or on server/hardware) from lightweight, rapid online fusion of user input, allowing deployment on low-power devices (InterFormer: Real-time Interactive Image Segmentation, 2023). User interaction minimization is addressed through strategies such as exemplar transfer (multi-object IS), contour-based interfaces, and diversified seed proposals for swipe gestures on touch devices (SwipeCut: Interactive Segmentation with Diversified Seed Proposals, 2018).
5. Evaluation, Benchmarks, and Practical Applications
IS methods are assessed by metrics such as:
- NoC@IoU: Average number of user actions (clicks, scribbles) required to reach a specified IoU threshold, typically 85% or 90%.
- Dice coefficient (DSC), mean IoU (mIoU): Overlap metrics commonly reported.
- Robustness metrics: Difference between area under best- and worst-case IoU–click curves (TETRIS: Towards Exploring the Robustness of Interactive Segmentation, 9 Feb 2024).
- RICE: A relative improvement metric that quantifies correction rather than just overlap (Interactive Segmentation for Diverse Gesture Types Without Context, 2023).
Benchmarks span natural images (COCO, Pascal VOC, SBD, Berkeley, GrabCut), multi-object and video segmentation datasets, medical imaging (BraTS, CT/MRI), and noisier or specialized domains (e.g., camouflaged objects).
Applications include:
- Medical annotation: Reducing expert effort for organ, tumor, or lesion marking (Deep Interactive Segmentation of Medical Images: A Systematic Review and Taxonomy, 2023).
- Large-scale dataset creation: Enabling rapid, flexible annotation for supervised learning.
- Image/video editing: Assisting in precise cut-out and compositing tasks.
- Mobile and fieldwork settings: Touchscreen-optimized interfaces and low-compute pipelines.
6. Ongoing Challenges and Future Directions
The field remains dynamic, with several prominent challenges and research avenues:
- Domain Adaptation and Generalization: Ensuring out-of-domain performance with minimum user correction, leveraging continual/adaptive or prompt-based architectures (RAIS: Robust and Accurate Interactive Segmentation via Continual Learning, 2022).
- Multi-gesture, Context-Agnostic Interaction: Developing universal models robust to various gesture types with or without explicit user intent signals (Interactive Segmentation for Diverse Gesture Types Without Context, 2023).
- Multi-object and Exemplar Transfer: Transferring knowledge between objects within the same category to minimize repetition and user effort (Learning from Exemplars for Interactive Image Segmentation, 17 Jun 2024).
- Standardization and Benchmarking: There is a recognized need for unified protocols, metrics (including user-centric and robustness measures), and public baselines, particularly in specialized domains like medical imaging (Deep Interactive Segmentation of Medical Images: A Systematic Review and Taxonomy, 2023).
- Annotation-minimal and Unsupervised Learning: Exploiting self-supervised features and simulated region hierarchies to train IS models without manual masks, yielding strong results with no labeling (Multi-granularity Interaction Simulation for Unsupervised Interactive Segmentation, 2023).
7. Summary Table: Key Interactive Segmentation Paradigms
Paradigm | Core Mechanism | Notable Properties |
---|---|---|
Graph-based propagation | Label diffusion on graphs, often in 2 stages | Minimal input, efficient, scalable, topology-agnostic |
CNN/Transformer-based iterative refinement | Deep learning with auxiliary user input channels | High accuracy, adaptable, supports complex cues |
Exemplar-based transfer for multi-object IS | Transfer knowledge from one mask to related objects | Labor saving for similar object segmentation |
Test-time adaptation & Divide-and-Conquer | Online optimization per-subset of user cues | Handles complex/conflicting corrections robustly |
Robustness-evaluated/Adversarial IS | Stress-testing with adversarial and user-like inputs | Directly quantifies real-world reliability |
Unsupervised/Self-supervised IS | Trains IS using simulated regions, no manual labels | High label efficiency, enables rapid deployment |
References
- Noma et al., A New Algorithm for Interactive Structural Image Segmentation (A New Algorithm for Interactive Structural Image Segmentation, 2008)
- Duan et al., TETRIS: Towards Exploring the Robustness of Interactive Segmentation (TETRIS: Towards Exploring the Robustness of Interactive Segmentation, 9 Feb 2024)
- MyersDean et al., Interactive Segmentation for Diverse Gesture Types Without Context (Interactive Segmentation for Diverse Gesture Types Without Context, 2023)
- Shi, Liu et al., Learning from Exemplars for Interactive Image Segmentation (Learning from Exemplars for Interactive Image Segmentation, 17 Jun 2024)
- You Huang et al., InterFormer: Real-time Interactive Image Segmentation (InterFormer: Real-time Interactive Image Segmentation, 2023)
- QI et al., QIS: Interactive Segmentation via Quasi-Conformal Mappings (QIS : Interactive Segmentation via Quasi-Conformal Mappings, 22 Feb 2024)
- Yang et al., IFSENet: Harnessing Sparse Iterations for Interactive Few-shot Segmentation Excellence (IFSENet : Harnessing Sparse Iterations for Interactive Few-shot Segmentation Excellence, 22 Mar 2024)
- GPCIS authors, Interactive Segmentation as Gaussian Process Classification (Interactive Segmentation as Gaussian Process Classification, 2023)
- Shi et al., Multi-granularity Interaction Simulation for Unsupervised Interactive Segmentation (Multi-granularity Interaction Simulation for Unsupervised Interactive Segmentation, 2023)
- Reinke et al., Deep Interactive Segmentation of Medical Images: A Systematic Review and Taxonomy (Deep Interactive Segmentation of Medical Images: A Systematic Review and Taxonomy, 2023)
- Zhang et al., RAIS: Robust and Accurate Interactive Segmentation via Continual Learning (RAIS: Robust and Accurate Interactive Segmentation via Continual Learning, 2022)
- Mahadevan et al., Iteratively Trained Interactive Segmentation (Iteratively Trained Interactive Segmentation, 2018)
- Li et al., Divide-and-Conquer Framework for Test-Time Adaptation of Interactive Segmentation (DC-TTA: Divide-and-Conquer Framework for Test-Time Adaptation of Interactive Segmentation, 29 Jun 2025)
This corpus of research demonstrates both the rapid evolution of IS and the continued centrality of user–algorithm synergies in real-world segmentation workflows.