Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation (1502.02734v3)

Published 9 Feb 2015 in cs.CV

Abstract: Deep convolutional neural networks (DCNNs) trained on a large number of images with strong pixel-level annotations have recently significantly pushed the state-of-art in semantic image segmentation. We study the more challenging problem of learning DCNNs for semantic image segmentation from either (1) weakly annotated training data such as bounding boxes or image-level labels or (2) a combination of few strongly labeled and many weakly labeled images, sourced from one or multiple datasets. We develop Expectation-Maximization (EM) methods for semantic image segmentation model training under these weakly supervised and semi-supervised settings. Extensive experimental evaluation shows that the proposed techniques can learn models delivering competitive results on the challenging PASCAL VOC 2012 image segmentation benchmark, while requiring significantly less annotation effort. We share source code implementing the proposed system at https://bitbucket.org/deeplab/deeplab-public.

Authors (4)

George Papandreou (16 papers)
Liang-Chieh Chen (66 papers)
Kevin Murphy (87 papers)
Alan L. Yuille (72 papers)

Citations (894)

View on Semantic Scholar

Summary

Overview of Weakly- and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation

This research paper, authored by George Papandreou, Liang-Chieh Chen, Kevin Murphy, and Alan L. Yuille, addresses the challenge of training Deep Convolutional Neural Networks (DCNNs) for semantic image segmentation using weakly and semi-supervised learning techniques. The emphasis is on minimizing the annotation effort typically required for such tasks by leveraging weakly annotated data like bounding boxes, image-level labels, or a combination of a few strongly labeled images and many weakly labeled ones.

Key Contributions

EM Algorithms for Training with Weak and Semi-Supervised Annotations: The authors develop Expectation-Maximization (EM) methods tailored for training DCNN models under weak and semi-supervised settings. These algorithms iteratively estimate the latent pixel labels and optimize the DCNN parameters.
Performance Evaluation with Reduced Annotation Effort: The paper demonstrates that competitive segmentation results can be achieved on the PASCAL VOC 2012 benchmark while significantly reducing the annotation workload. In particular, leveraging a combination of pixel-level annotated images with weak annotations can nearly match the performance of fully-supervised systems.
Cross-Dataset Utilization: By combining annotations from PASCAL VOC 2012 with additional weak or strong annotations from the MS-COCO dataset, further improvements in segmentation performance are observed, achieving 73.9% mean IOU on the PASCAL VOC 2012 benchmark.

Detailed Findings

Evaluation of Weak and Semi-Supervised Learning

The authors utilize multiple variants of the EM algorithm, specifically:

EM-Fixed: Where fixed biases are used to guide the segmentation process.
EM-Adapt: Where the biases are adapted during training to ensure sufficient foreground object coverage.

In the weakly-supervised setting, EM-Adapt achieved a 38.2% mean IOU, significantly outperforming EM-Fixed (20.8%). When additional strong annotations are introduced in a semi-supervised setting, performance improves drastically. With 1,464 strong annotations combined with weak image-level labels, the model achieved a mean IOU of 64.6%.

Comparison with State-of-the-Art Methods

EM-Adapt demonstrated notable performance against other multiple instance learning (MIL) techniques. Specifically:

EM-Adapt achieved 39.6% mean IOU with weak image-level labels alone.
This performance is competitive with MIL-sppxl and MIL-seg by Pinheiro et al., which utilize complex objectness and segmentation modules for weakly-supervised training.

Bounding Box Annotations

Training with bounding box annotations also showed promising results. When using only bounding box annotations, the Bbox-Seg method attained a mean IOU of 60.6%. In a semi-supervised scenario with 1,464 strong annotations and 9,118 bounding box annotations, the performance reached 65.1%, closely approaching the fully-supervised result of 67.6%.

Cross-Dataset Augmentation

By incorporating the MS-COCO dataset:

Cross-Pretrain (Strong): Pretraining on MS-COCO and fine-tuning on PASCAL VOC yielded a mean IOU of 71.0%.
Cross-Joint (Strong): Joint training on both datasets improved performance to 71.7%.
Cross-Joint (Semi): Even with only 5,000 strong MS-COCO annotations and 118,287 weak annotations, the method achieved 70.0%, highlighting the efficacy of semi-supervised learning.

Implications and Future Directions

This work signifies a critical advancement in reducing annotation efforts required for semantic image segmentation, making strong numerical claims about the viability of weakly and semi-supervised methods. Practically, this approach can significantly lower the barriers for large-scale image segmentation tasks, particularly in domains where obtaining detailed annotations is cumbersome or expensive.

Future research could explore:

End-to-end training of DCNN and CRF parameters to maximize the benefits of integrated optimization.
Extending these methods to other complex datasets and evaluating their robustness across different domains.
Further refinement of the EM algorithms to handle more diverse weak annotations effectively, potentially boosting performance in minimal annotation scenarios.

In conclusion, this paper thoroughly documents the development and effectiveness of novel training techniques for semantic segmentation, showing that weakly and semi-supervised learning can substantially bridge the gap to fully-supervised methods, thereby providing practical, efficient alternatives for training robust segmentation models.

PDF Markdown

Related Papers

Find Related Papers