ClassMix: Segmentation-Based Data Augmentation for Semi-Supervised Learning (2007.07936v2)

Published 15 Jul 2020 in cs.CV

Abstract: The state of the art in semantic segmentation is steadily increasing in performance, resulting in more precise and reliable segmentations in many different applications. However, progress is limited by the cost of generating labels for training, which sometimes requires hours of manual labor for a single image. Because of this, semi-supervised methods have been applied to this task, with varying degrees of success. A key challenge is that common augmentations used in semi-supervised classification are less effective for semantic segmentation. We propose a novel data augmentation mechanism called ClassMix, which generates augmentations by mixing unlabelled samples, by leveraging on the network's predictions for respecting object boundaries. We evaluate this augmentation technique on two common semi-supervised semantic segmentation benchmarks, showing that it attains state-of-the-art results. Lastly, we also provide extensive ablation studies comparing different design decisions and training regimes.

Authors (4)

Viktor Olsson (3 papers)
Wilhelm Tranheden (2 papers)
Juliano Pinto (7 papers)
Lennart Svensson (81 papers)

Citations (290)

View on Semantic Scholar

Summary

Analysis of "ClassMix: Segmentation-Based Data Augmentation for Semi-Supervised Learning"

The paper introduces a technique called "ClassMix" for the task of semantic segmentation within the context of semi-supervised learning, aimed at addressing the persistent challenge of label scarcity in training datasets. Semantic segmentation, involving pixel-wise label assignment, plays a critical role in various domains such as autonomous driving and medical imaging. The bottleneck created by the need for exhaustive manual label annotation has driven research into semi-supervised learning techniques as a cost-effective alternative.

Key to the ClassMix approach is a novel data augmentation strategy that mixes semantic classes across image pairs from unlabeled datasets, leveraging predictions made by preliminary segmentation networks. This is distinct from existing methods like CutMix, which do not necessarily respect semantic boundaries as effectively. Here, ClassMix capitalizes on the network's ability to demarcate these boundaries, thereby synthesizing more coherent and contextually appropriate training samples from unlabeled data.

Empirically, ClassMix demonstrates robust performance across standard benchmarks for semi-supervised semantic segmentation. On the Cityscapes dataset, known for its complexity and the presence of abundant classes within individual images, ClassMix outperformed competitive baselines, showcasing superior mIoU scores. This success is attributed to the method’s capacity for generating diverse and semantically meaningful augmented samples, an attribute accentuated by the nature of urban scenes in Cityscapes where class object distribution is noticeably biased spatially.

In addition to its core mechanism, the method employs pseudo-labelling and consistency regularization to reinforce learning from augmented data. Pseudo-labelling, by enforcing high confidence in the network's predictions on unlabeled data, reduces entropy, thus allowing the model to self-supervise effectively. The integration with the Mean Teacher Framework, where network parameters are updated with a moving average, ensures greater consistency and stable convergence across training epochs.

For datasets like Pascal VOC 2012, where classes are less densely packed and their spatial distributions less predictable, the ClassMix approach still yields competitive results. However, challenges arise due to limited class occurrence per image, leading to lower diversity in augmentation masks and, consequently, less significant performance improvements compared to Cityscapes.

The paper also meticulously presents an ablation paper that highlights the impact of various components within the ClassMix framework. Variations such as different unsupervised loss weighting strategies and the incorporation of additional augmentations are explored, revealing that the efficacy of ClassMix can be finely tuned and that its advantages are robust, albeit influenced by dataset characteristics.

Implications from this research are substantial for the field of computer vision, particularly for expanding the utility of semi-supervised learning methods in real-world applications beset by label scarcity. The ability to successfully extract utility from unlabeled datasets without compromising segmentation quality heralds broader applications in scenarios where full datasets are impractical to gather or maintain.

In sum, the methodological insights and empirical evidence presented in this paper affirm ClassMix as a potent tool for advancing semi-supervised learning in semantic segmentation. Future explorations could further optimize its efficacy across varied datasets, potentially integrating within larger frameworks to address segmentation across multimodal data streams.

PDF Markdown

Related Papers

Find Related Papers