Exploiting saliency for object segmentation from image level labels (1701.08261v2)

Published 28 Jan 2017 in cs.CV

Abstract: There have been remarkable improvements in the semantic labelling task in the recent years. However, the state of the art methods rely on large-scale pixel-level annotations. This paper studies the problem of training a pixel-wise semantic labeller network from image-level annotations of the present object classes. Recently, it has been shown that high quality seeds indicating discriminative object regions can be obtained from image-level labels. Without additional information, obtaining the full extent of the object is an inherently ill-posed problem due to co-occurrences. We propose using a saliency model as additional information and hereby exploit prior knowledge on the object extent and image statistics. We show how to combine both information sources in order to recover 80% of the fully supervised performance - which is the new state of the art in weakly supervised training for pixel-wise semantic labelling. The code is available at https://goo.gl/KygSeb.

Authors (6)

Seong Joon Oh (60 papers)
Rodrigo Benenson (22 papers)
Anna Khoreva (27 papers)
Zeynep Akata (144 papers)
Mario Fritz (160 papers)
Bernt Schiele (210 papers)

Citations (196)

View on Semantic Scholar

Summary

Exploiting Saliency for Object Segmentation from Image Level Labels

The paper "Exploiting Saliency for Object Segmentation from Image Level Labels" presents an advanced methodology for weakly supervised semantic segmentation, addressing the challenges posed by the reliance on large-scale pixel-level annotations in traditional fully supervised approaches. The authors introduce a novel technique that leverages saliency information to improve object segmentation using only image-level labels, achieving significant performance gains.

The core of this paper is the novel integration of saliency maps with seed information, derived from image-level annotations, to enhance the accuracy of semantic segmentation models. The current state-of-the-art techniques typically achieve about 75% of the accuracy of fully supervised models when using weak supervision. The authors propose a method to raise this to 80% without requiring additional user inputs or data beyond image-level labels.

The methodological framework is built around what the authors term the "Guided Segmentation" architecture. This architecture consists of two primary modules: a "guide labeller" and a segmenter. The guide labeller is responsible for generating an indicative segmentation mask by amalgamating cues from both seeding and saliency models, which is then used to train a segmentation convolutional neural network (CNN) in a supervised fashion.

Technical Contributions

Seed Generation: The paper explores several methods for generating seeds—indicative of discriminative object regions—from image-level classifiers, such as variants of Global Average Pooling (GAP) and various backpropagation techniques. Upon empirical evaluation, GAP-based approaches, specifically the $\mathtt{GAP}\text{-}\mathtt{HighRes}$ model, are shown to provide superior seeds in terms of precision-recall metrics.
Saliency Utilization: The integration of class-agnostic saliency maps is a key innovation. The saliency model, trained on a separate dataset to avoid bias, detects potentially all prominent subjects in an image. This saliency information is crucial for discerning the full extent of objects—information not typically captured by seeds alone due to potential ambiguities in object boundaries.
Guide Labelling Strategies: Various strategies to combine seeds and saliency maps were developed and tested. The strategy labeled $\mathcal{G}_{2}$ , which uses seeds to label saliency-identified object regions, resulted in the most effective segmentation masks during training, significantly improving performance over using seeds or saliency alone.

Empirical Results

The authors validate their approach on the Pascal VOC 2012 dataset. The proposed method using the combination strategy $\mathcal{G}_{2}$ achieves a mean Intersection over Union (mIoU) of 55.7 on the validation set, compared to previous methods, underscoring the superior effectiveness of incorporating saliency.

Conclusion and Future Work

This research succeeds in enhancing weakly supervised semantic segmentation by effectively exploiting saliency information. The presented approach bridges a significant portion of the performance gap between weak and fully supervised methods. Future directions may explore improvements in saliency model accuracy and the application of this framework to more diverse datasets, potentially extending to dynamic environments where segmentation must occur in real-time. Additionally, exploring the interplay between saliency and more complex visual phenomena, such as depth or motion, might yield further advancements.

Overall, the paper provides a comprehensive methodology that not only leverages existing theoretical underpinnings in saliency and seed creation but also practically advances the field of semantic segmentation using limited annotations.

PDF Markdown