BoxTeacher: Exploring High-Quality Pseudo Labels for Weakly Supervised Instance Segmentation (2210.05174v2)

Published 11 Oct 2022 in cs.CV

Abstract: Labeling objects with pixel-wise segmentation requires a huge amount of human labor compared to bounding boxes. Most existing methods for weakly supervised instance segmentation focus on designing heuristic losses with priors from bounding boxes. While, we find that box-supervised methods can produce some fine segmentation masks and we wonder whether the detectors could learn from these fine masks while ignoring low-quality masks. To answer this question, we present BoxTeacher, an efficient and end-to-end training framework for high-performance weakly supervised instance segmentation, which leverages a sophisticated teacher to generate high-quality masks as pseudo labels. Considering the massive noisy masks hurt the training, we present a mask-aware confidence score to estimate the quality of pseudo masks and propose the noise-aware pixel loss and noise-reduced affinity loss to adaptively optimize the student with pseudo masks. Extensive experiments can demonstrate the effectiveness of the proposed BoxTeacher. Without bells and whistles, BoxTeacher remarkably achieves 35.0 mask AP and 36.5 mask AP with ResNet-50 and ResNet-101 respectively on the challenging COCO dataset, which outperforms the previous state-of-the-art methods by a significant margin and bridges the gap between box-supervised and mask-supervised methods. The code and models will be available at https://github.com/hustvl/BoxTeacher.

Authors (5)

Tianheng Cheng (31 papers)
Xinggang Wang (163 papers)
Shaoyu Chen (26 papers)
Qian Zhang (308 papers)
Wenyu Liu (146 papers)

Citations (34)

View on Semantic Scholar

Summary

BoxTeacher: Exploring High-Quality Pseudo Labels for Weakly Supervised Instance Segmentation

The paper "BoxTeacher: Exploring High-Quality Pseudo Labels for Weakly Supervised Instance Segmentation" introduces an innovative approach to address the challenges in weakly supervised instance segmentation by leveraging pseudo labels that exhibit high fidelity. This approach, termed BoxTeacher, creatively integrates high-quality pseudo labeling with a sophisticated training process to bridge the performance gap between box-supervised and mask-supervised methods.

In standard instance segmentation, obtaining instance-wise annotations is labor-intensive, demanding a significant amount of effort from human annotators. Traditionally, the reliance on pixel-wise annotations has resulted in bottlenecks for large-scale data processing. The advent of weak supervision, primarily using bounding boxes, has spurred methodologies that attempt to mitigate this annotation cost. Existing weakly supervised methods typically employ heuristic designs or low-level cues to generate segmentation masks from bounding boxes. The primary innovation of BoxTeacher is the integration of a training framework that significantly enhances the quality of these pseudo labels while suppressing the impact of noisy annotations.

Core Contributions

Framework Design: BoxTeacher proposes an end-to-end framework combining a teacher-student architecture. This framework employs high-quality pseudo labels via a robust teacher component and adapts to noise through refined loss functions. A key innovation is the mask-aware confidence score, which assigns a quality measure to these pseudo masks and selectively educates the student model during training.
Loss Optimization: The introduction of noise-aware pixel loss and noise-reduced affinity loss within BoxTeacher's architecture is pivotal. These losses are designed to fine-tune the pseudo label refinement process. They help in identifying and optimizing only the high-confidence segmentation masks, thus enhancing the segmentation map quality without extensive reliance on manual annotations.
Empirical Evaluation: The BoxTeacher framework undergoes comprehensive evaluation on challenging datasets like COCO, PASCAL VOC, and Cityscapes, outperforming previous state-of-the-art methods in weakly supervised segmentation by substantial margins. Specifically, it achieves notable improvements in mask average precision (AP), thus narrowing the efficacy gap with fully supervised methods.

Implications and Future Directions

By reinforcing the segmentation capability with high-confidence pseudo labels, BoxTeacher enables superior segmentation performance, thus establishing a new benchmark in weakly supervised learning. The methodological success extends beyond standard architectures, showcasing generalizability with advanced backbones like the Swin Transformer.

Practically, the advancements presented by BoxTeacher can significantly reduce annotation costs in large-scale object segmentation endeavors. This approach sets a precedent for future work exploring semi-supervised and unsupervised paradigms where leveraging available yet imperfect data is key. Further improvements could explore refining the quality assessment measures or integrating cross-modal inputs to harness even richer supervised signals.

Overall, BoxTeacher exemplifies how theoretically sound frameworks can be applied to practical problems, advancing the real-world deployment of computer vision systems that rely minimally on high-quality annotations. This offers a promising pathway for the progressive realization of autonomous systems performing complex visual recognition tasks with reduced human intervention.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - hustvl/BoxTeacher: [CVPR 2023] Exploring High-Quality Pseudo Masks for Weakly Supervised Instance Segmentation (76 stars)