Causal Intervention for Weakly-Supervised Semantic Segmentation

Published 26 Sep 2020 in cs.CV | (2009.12547v2)

Abstract: We present a causal inference framework to improve Weakly-Supervised Semantic Segmentation (WSSS). Specifically, we aim to generate better pixel-level pseudo-masks by using only image-level labels -- the most crucial step in WSSS. We attribute the cause of the ambiguous boundaries of pseudo-masks to the confounding context, e.g., the correct image-level classification of "horse" and "person" may be not only due to the recognition of each instance, but also their co-occurrence context, making the model inspection (e.g., CAM) hard to distinguish between the boundaries. Inspired by this, we propose a structural causal model to analyze the causalities among images, contexts, and class labels. Based on it, we develop a new method: Context Adjustment (CONTA), to remove the confounding bias in image-level classification and thus provide better pseudo-masks as ground-truth for the subsequent segmentation model. On PASCAL VOC 2012 and MS-COCO, we show that CONTA boosts various popular WSSS methods to new state-of-the-arts.

Abstract PDF Upgrade to Chat

Citations (410)

View on Semantic Scholar

Summary

The paper presents a novel causality-aware approach that dissects confounding interactions in weakly-supervised semantic segmentation.
It proposes the CONTA framework, which leverages backdoor adjustment to refine Class Activation Maps and produce high-quality pseudo-masks.
Experimental results on PASCAL VOC 2012 and MS-COCO benchmarks show significant mIoU improvements over existing methods.

Causal Intervention for Weakly-Supervised Semantic Segmentation

The paper presents a novel approach utilizing causal inference to address ambiguities in Weakly-Supervised Semantic Segmentation (WSSS). In this domain, the challenge lies in generating high-quality pixel-level pseudo-masks using only image-level labels, which often result in imprecise boundaries due to the influence of confounding contexts. The authors propose a structural causal model to dissect the causal relationships among images, contexts, and class labels, with the goal of eliminating the confounding bias that plagues current classification models. This is achieved through a methodological framework termed Context Adjustment (CONTA), which leverages backdoor adjustment to provide improved pseudo-masks, ultimately enhancing segmentation performance.

Problem Definition and Model Formulation

The ambiguity in WSSS primarily arises from three core issues:

Object Ambiguity - Co-occurrence of objects under typical contexts causing misclassification, such as "horses" often appearing with "people".
Incomplete Background - Unlabeled background elements misidentified as foreground.
Incomplete Foreground - Foreground components partially missing due to context-dependent variability.

The paper posits that these challenges stem from dataset context prior biases, where contexts serve as confounders, leading to spurious correlations in model predictions. Through causal inference principles, specifically the backdoor adjustment, the authors propose computing the causal effect, $P(Y|do(X))$ , to directly address these biases.

Methodology: Context Adjustment (CONTA)

CONTA is an iterative pipeline designed to refine pseudo-mask quality:

First, images are processed to classify labels using a causality-aware approach.
This involves generating Class Activation Maps (CAM) and processing them to form seed areas for the pseudo-mask.
Subsequently, a segmentation model is trained using these pseudo-masks as a form of weak supervision.

A critical component is the approximation of the unobserved context confounders via average masks from class data, although the goal remains to calculate $P(Y|do(X))$ effectively through practical adjustments.

Experimental Results

CONTA is evaluated primarily using PASCAL VOC 2012 and MS-COCO benchmarks, demonstrating significant improvements in mean Intersection over Union (mIoU) scores over existing techniques:

PASCAL VOC 2012: Achieving 66.1% mIoU on validation and 66.7% on test sets, surpassing previous state-of-the-art models.
MS-COCO: Demonstrating strong competitive performance with enhanced metric scores.

The results underscore the efficacy of causal intervention in WSSS, with marked improvements in handling context-induced ambiguities.

Implications and Future Directions

The adoption of causal reasoning offers a structured approach to fundamentally address data bias and improves semantic segmentation in settings with limited supervision. The paper suggests that future research could expand upon this by developing more advanced strategies for confounder set discovery or by embedding expert knowledge directly into model frameworks to further mitigate biases.

Overall, this work contributes significantly to the field by introducing a robust causal framework that not only enhances the training of WSSS models but also points toward a broader understanding of incorporating causal reasoning in machine learning settings. The proposed CONTA framework could inspire further exploration into causal methods for other weakly-supervised learning problems within computer vision and beyond.

Markdown