Box2Mask: Box-supervised Instance Segmentation via Level-set Evolution (2212.01579v1)

Published 3 Dec 2022 in cs.CV

Abstract: In contrast to fully supervised methods using pixel-wise mask labels, box-supervised instance segmentation takes advantage of simple box annotations, which has recently attracted increasing research attention. This paper presents a novel single-shot instance segmentation approach, namely Box2Mask, which integrates the classical level-set evolution model into deep neural network learning to achieve accurate mask prediction with only bounding box supervision. Specifically, both the input image and its deep features are employed to evolve the level-set curves implicitly, and a local consistency module based on a pixel affinity kernel is used to mine the local context and spatial relations. Two types of single-stage frameworks, i.e., CNN-based and transformer-based frameworks, are developed to empower the level-set evolution for box-supervised instance segmentation, and each framework consists of three essential components: instance-aware decoder, box-level matching assignment and level-set evolution. By minimizing the level-set energy function, the mask map of each instance can be iteratively optimized within its bounding box annotation. The experimental results on five challenging testbeds, covering general scenes, remote sensing, medical and scene text images, demonstrate the outstanding performance of our proposed Box2Mask approach for box-supervised instance segmentation. In particular, with the Swin-Transformer large backbone, our Box2Mask obtains 42.4% mask AP on COCO, which is on par with the recently developed fully mask-supervised methods. The code is available at: https://github.com/LiWentomng/boxlevelset.

PDF Abstract

Analysis of "Box2Mask: Box-supervised Instance Segmentation via Level-set Evolution"

The paper "Box2Mask: Box-supervised Instance Segmentation via Level-set Evolution" introduces an innovative method for instance segmentation tasks circumventing the constraints of pixel-wise supervision by leveraging bounding box annotations. The core proposition of Box2Mask is a strategic integration of the level-set evolution method into deep neural networks, specifically tailored to address the challenge of deriving accurate segmentation masks with minimal supervision.

At the heart of the Box2Mask approach lies a meticulously crafted level-set evolution model which operates within a deep learning framework. By employing energy-based level-set variational methods, Box2Mask capitalizes on both the inherent low-level features of images and the richer semantic features extracted by neural networks. This dual input strategy aids in reinforcing the robustness and precision of the resulting object boundary predictions. Significantly, the evolutionary mechanism of Box2Mask is initialized within predicted object boundaries framed by bounding boxes, effectively minimizing initialization sensitivities typically associated with standard level-set approaches.

Box2Mask distinguishes itself through its architecture, comprising two single-stage frameworks: a CNN-based and a transformer-based framework. Both are designed with components that facilitate instance-wise decoding and box-level matching assignment, crucial for aligning predicted masks to their respective bounding boxes. In particular, the transformer-based framework stands out, demonstrating superior performance on various benchmarks, including general object datasets like COCO and Pascal VOC, as well as domain-specific datasets such as iSAID (remote sensing) and LiTS (medical imaging).

The empirical evaluation on multiple challenging benchmarks illustrates the efficacy of Box2Mask. For instance, the model achieves 42.4% mask AP on the COCO dataset with a Swin-Transformer large backbone, a performance level comparable to that of leading fully supervised methods. Such results underscore the model's capacity to yield competitive results while relying solely on bounding box annotations. Furthermore, the experimentation across diverse datasets further attests to the model's versatility in handling different instance segmentation scenarios, ranging from complex, cluttered scenes to specialized domains like medical imaging.

Box2Mask also pioneers in the methodology of box-supervised segmentation by its introduction of an affinity-based local consistency module. This module mitigates the inhomogeneity challenges associated with region-based level-set methods by fostering local affinity consistency, enhancing both the segmentation's quality and its robustness to variable conditions.

In looking towards future implications, Box2Mask paves the way for more accessible and scalable segmentation methodologies applicable across a broader spectrum of categories and environments. The potential reduction in annotation costs, coupled with its methodological innovations, suggests Box2Mask could serve as a springboard for further inquiries into weakly supervised learning paradigms, with broader applications in AI-driven image and video analysis. This leads to the anticipation of more adaptive and refined segmentation models that effectively balance performance with computational and data efficiency within real-world applications.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Wentong Li (25 papers)
Wenyu Liu (146 papers)
Jianke Zhu (68 papers)
Miaomiao Cui (27 papers)
Risheng Yu (1 paper)
Xiansheng Hua (26 papers)
Lei Zhang (1689 papers)

Citations (21)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - LiWentomng/boxlevelset: The code for "Box-supervised Instance Segmentation with Level Set Evolution(ECCV2022)" (188 stars)