Fully Convolutional Multi-Class Multiple Instance Learning (1412.7144v4)

Published 22 Dec 2014 in cs.CV, cs.LG, and cs.NE

Abstract: Multiple instance learning (MIL) can reduce the need for costly annotation in tasks such as semantic segmentation by weakening the required degree of supervision. We propose a novel MIL formulation of multi-class semantic segmentation learning by a fully convolutional network. In this setting, we seek to learn a semantic segmentation model from just weak image-level labels. The model is trained end-to-end to jointly optimize the representation while disambiguating the pixel-image label assignment. Fully convolutional training accepts inputs of any size, does not need object proposal pre-processing, and offers a pixelwise loss map for selecting latent instances. Our multi-class MIL loss exploits the further supervision given by images with multiple labels. We evaluate this approach through preliminary experiments on the PASCAL VOC segmentation challenge.

Citations (304)

View on Semantic Scholar

Summary

The paper introduces a novel MIL formulation integrated with fully convolutional networks that streamlines training by avoiding object proposal preprocessing.
The paper achieves a mean IU of 25.66% on VOC 2012, marking a significant improvement over the baseline of 13.09% in weakly supervised segmentation.
The paper employs a multi-class pixel-level loss with inter-class competition to enhance segmentation accuracy while minimizing reliance on pixel-level annotations.

Fully Convolutional Multi-Class Multiple Instance Learning

The paper "Fully Convolutional Multi-Class Multiple Instance Learning" addresses a crucial challenge in the field of semantic segmentation within computer vision: the dependence on laborious and costly annotation processes. By leveraging the framework of multiple instance learning (MIL) integrated with fully convolutional networks (FCNs), the authors aim to alleviate this burden by learning from weak image-level labels rather than pixel-level ones.

Core Contributions

The primary contribution lies in the introduction of a novel MIL formulation tailored for multi-class semantic segmentation facilitated by a fully convolutional network. This approach departs from traditional methods that necessitate object proposal preprocessing and incorporates a pixelwise loss map to select latent instances, significantly reducing the need for bounding box annotations.

End-to-end FCN Training: The proposed model supports fully convolutional training that accepts varying input sizes, eliminating the constraints of prior object proposal preprocessing. This contributes to a streamlined and accelerated training process.
Multi-Class MIL Loss: The paper introduces a multi-class pixel-level loss inspired by the binary MIL scenario. This loss attempts to maximize the classification scores for pixel-instances while benefiting from inter-class competition to refine pixel classification predictions.
Weakly Supervised Segmentation: By targeting weak supervision, the framework incorporates image structure more effectively than bounding boxes, leveraging pixel-level consistency cues to disambiguate object presence indicators.

Experiments and Results

The authors evaluate their approach on the PASCAL VOC segmentation challenge. Notably, the model achieves a mean intersection over union (IU) of 25.66% on the VOC 2012 test set, a significant improvement over the baseline's 13.09%. Although the performance remains below the fully supervised state-of-the-art level, it indicates substantial progress in weakly supervised segmentation.

Implications and Future Directions

This paper highlights the potential for integrating MIL with FCNs in semantic segmentation under weak supervision scenarios. The methodology outlined provides a pathway to reduce annotation costs, offering a scalable solution for tasks requiring large-scale data input. It further opens avenues for exploring supplementary enhancements such as conditional random field regularization and super-pixel projection to improve segmentation precision.

The introduction of techniques to manipulate the loss map suggests broader implications across tasks such as co-segmentation and hard negative mining. Future work could focus on optimizing these processes, potentially leveraging advancements in representation learning to further refine segmentation accuracy.

In conclusion, this paper marks a step forward in addressing the stigmas associated with annotation-heavy segmentation models. By pioneering a method that bridges MIL with FCN for weak supervision, the paper lays foundational work that can guide future innovations in efficient and adaptive computer vision solutions.

PDF Markdown