Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 77 tok/s

Gemini 2.5 Pro 56 tok/s Pro

GPT-5 Medium 33 tok/s Pro

GPT-5 High 21 tok/s Pro

GPT-4o 107 tok/s Pro

Kimi K2 196 tok/s Pro

GPT OSS 120B 436 tok/s Pro

Claude Sonnet 4.5 34 tok/s Pro

2000 character limit reached

Highly Accurate Dichotomous Image Segmentation (2203.03041v4)

Published 6 Mar 2022 in cs.CV

Abstract: We present a systematic study on a new task called dichotomous image segmentation (DIS) , which aims to segment highly accurate objects from natural images. To this end, we collected the first large-scale DIS dataset, called DIS5K, which contains 5,470 high-resolution (e.g., 2K, 4K or larger) images covering camouflaged, salient, or meticulous objects in various backgrounds. DIS is annotated with extremely fine-grained labels. Besides, we introduce a simple intermediate supervision baseline (IS-Net) using both feature-level and mask-level guidance for DIS model training. IS-Net outperforms various cutting-edge baselines on the proposed DIS5K, making it a general self-learned supervision network that can facilitate future research in DIS. Further, we design a new metric called human correction efforts (HCE) which approximates the number of mouse clicking operations required to correct the false positives and false negatives. HCE is utilized to measure the gap between models and real-world applications and thus can complement existing metrics. Finally, we conduct the largest-scale benchmark, evaluating 16 representative segmentation models, providing a more insightful discussion regarding object complexities, and showing several potential applications (e.g., background removal, art design, 3D reconstruction). Hoping these efforts can open up promising directions for both academic and industries. Project page: https://xuebinqin.github.io/dis/index.html.

Citations (83)

View on Semantic Scholar

Summary

The paper introduces the DIS5K dataset, the IS-Net architecture, and the HCE metric, achieving unprecedented segmentation accuracy for dichotomous images.
It utilizes intermediate supervision at both feature and mask levels to integrate global and local context, outperforming traditional segmentation methods.
The comprehensive experimental results underscore its potential for applications in AR, medical imaging, and precise object manipulation.

Highly Accurate Dichotomous Image Segmentation

The paper "Highly Accurate Dichotomous Image Segmentation" presents a comprehensive approach towards the task of dichotomous image segmentation (DIS), emphasizing the need for highly accurate object segmentation from natural images. This work introduces several contributions to the field, notably the creation of a large-scale dataset, DIS5K, the proposition of a novel baseline IS-Net, and the formulation of a new evaluation metric, Human Correction Efforts (HCE).

Dataset and Task Specifics

The DIS5K dataset is a cornerstone of this research, comprising 5,470 high-resolution images annotated with extremely fine-grained labels. These images encapsulate a diverse range of objects, including camouflaged, salient, and meticulous items across various complex backgrounds. The diversity and high resolution of this dataset address common limitations found in existing datasets, such as low resolution and limited object complexity.

The task of dichotomous image segmentation differs from traditional multi-class segmentation by focusing on a two-class problem—object versus background—without considerations of object categories. This focus is driven by applications that require precise delineations, such as augmented reality, medical imaging, and object manipulation.

IS-Net: A Novel Segmentation Approach

The IS-Net is introduced as a simple network designed with intermediate supervision to enhance DIS model training. The IS-Net utilizes both feature-level and mask-level guidance, which has shown to outperform existing segmentation models on the DIS5K dataset. This approach emphasizes self-learned supervision, allowing the model to refine its outputs at various stages, thus integrating both global and local contextual information.

Human Correction Efforts Metric

Beyond conventional evaluation metrics, the paper proposes the Human Correction Efforts (HCE) metric, aiming to quantify the gap between model predictions and practical application needs. HCE measures the amount of human intervention required to correct erroneous segmentation results. This metric is particularly relevant for applications that necessitate high precision, offering a practical perspective on model performance assessment.

Experimental Results and Implications

The experimental results highlight IS-Net's superiority in handling the intricacies of the DIS task. Notably, IS-Net not only achieves higher accuracy across traditional metrics like F-measure and mean absolute error but also demonstrates lower HCE values, suggesting its potential for direct application in the field.

The paper's contributions pave the way for future developments in AI, particularly in applications demanding high precision segmentation. The introduction of DIS5K provides a solid foundation for further research, offering a significant improvement in dataset quality and diversity. Moreover, the HCE metric opens a new avenue for evaluating models based on practical application criteria rather than purely statistical measures.

Future Perspectives

Looking forward, the paper implies several directions for future work. The integration of more diverse categories in the DIS5K dataset, the continued development of models robust against varying object complexities, and further refinement of the HCE metric to include computational optimizations are potential areas of exploration. This research serves as a stepping stone toward more sophisticated and contextually aware AI systems capable of precise segmentation across various domains.