Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation (2107.11787v2)

Published 25 Jul 2021 in cs.CV

Abstract: Semantic segmentation is a challenging task in the absence of densely labelled data. Only relying on class activation maps (CAM) with image-level labels provides deficient segmentation supervision. Prior works thus consider pre-trained models to produce coarse saliency maps to guide the generation of pseudo segmentation labels. However, the commonly used off-line heuristic generation process cannot fully exploit the benefits of these coarse saliency maps. Motivated by the significant inter-task correlation, we propose a novel weakly supervised multi-task framework termed as AuxSegNet, to leverage saliency detection and multi-label image classification as auxiliary tasks to improve the primary task of semantic segmentation using only image-level ground-truth labels. Inspired by their similar structured semantics, we also propose to learn a cross-task global pixel-level affinity map from the saliency and segmentation representations. The learned cross-task affinity can be used to refine saliency predictions and propagate CAM maps to provide improved pseudo labels for both tasks. The mutual boost between pseudo label updating and cross-task affinity learning enables iterative improvements on segmentation performance. Extensive experiments demonstrate the effectiveness of the proposed auxiliary learning network structure and the cross-task affinity learning method. The proposed approach achieves state-of-the-art weakly supervised segmentation performance on the challenging PASCAL VOC 2012 and MS COCO benchmarks.

PDF Abstract

Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation

The paper "Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation" introduces a novel approach to addressing the limitations of weakly supervised semantic segmentation (WSSS) in the absence of densely labeled data. Traditionally, semantic segmentation tasks require labor-intensive pixel-level annotations, making the exploration of weakly supervised methods a compelling area of research.

Key Contributions

AuxSegNet Framework: The proposed framework, AuxSegNet, is designed for weakly supervised multi-task learning, integrating auxiliary tasks such as saliency detection and multi-label image classification to enhance the primary task of semantic segmentation. This framework operates with only image-level ground-truth labels, thus reducing the label dependency considerably.
Cross-Task Global Pixel-Level Affinity Map: By learning a cross-task global pixel-level affinity map from the saliency and segmentation task representations, the network is equipped to refine saliency predictions and propagate class activation maps (CAM) to generate improved pseudo labels. This sophisticated manipulation of pixel affinities leverages the shared semantics between related tasks for better segmentation outcomes.
Iterative Improvement via Mutual Learning: One of the core innovations of this paper is the mutual enhancement between pseudo label updating and cross-task affinity learning. This setup allows for iterative improvements in the segmentation task, continuously refining outputs by taking advantage of the evolving pseudo labels and affinity learning.
State-of-the-Art Performance: AuxSegNet exhibits enhanced performance metrics when benchmarked against established datasets such as PASCAL VOC 2012 and MS COCO, achieving superior weakly supervised segmentation results. This paper presents empirical evidence highlighting the effectiveness of the multi-task auxiliary learning strategy and cross-task affinity methods.

Experimental Results

Extensive experimentation demonstrates that AuxSegNet outperforms several state-of-the-art methods for weakly supervised segmentation. It achieves considerable performance advancements on both the PASCAL VOC 2012 and MS COCO datasets. The framework's ability to iteratively refine outputs and progressively boost segmentation accuracy underscores its practical applicability in large-scale image analysis tasks.

Theoretical and Practical Implications

From a theoretical perspective, this research propels the understanding of cross-task learning interactions and how auxiliary tasks can be optimally harnessed to aid weakly supervised settings. In practical terms, AuxSegNet streamlines the annotation process and minimizes the need for exhaustive pixel-level labeling, potentially accelerating the deployment of semantic segmentation in real-world applications such as autonomous driving and scene understanding.

Future Directions

Looking forward, the potential adaptations of AuxSegNet could involve exploring additional auxiliary tasks or refining the cross-task affinity learning process for more diverse and complex datasets. Additionally, advancements in models akin to AuxSegNet may further narrow the performance gap between fully supervised and weakly supervised methodologies, expanding applications across various fields reliant on image segmentation technology.

In summary, this paper underscores a pivotal shift in weakly supervised semantic segmentation, advocating for a multi-faceted approach that intelligently fuses related tasks through affinity learning to produce more coherent and accurate segmentation outcomes.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Lian Xu (6 papers)
Wanli Ouyang (358 papers)
Mohammed Bennamoun (124 papers)
Farid Boussaid (30 papers)
Ferdous Sohel (35 papers)
Dan Xu (120 papers)

Citations (118)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - xulianuwa/AuxSegNet (28 stars)