CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features (1905.04899v2)

Published 13 May 2019 in cs.CV and cs.LG

Abstract: Regional dropout strategies have been proposed to enhance the performance of convolutional neural network classifiers. They have proved to be effective for guiding the model to attend on less discriminative parts of objects (e.g. leg as opposed to head of a person), thereby letting the network generalize better and have better object localization capabilities. On the other hand, current methods for regional dropout remove informative pixels on training images by overlaying a patch of either black pixels or random noise. Such removal is not desirable because it leads to information loss and inefficiency during training. We therefore propose the CutMix augmentation strategy: patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches. By making efficient use of training pixels and retaining the regularization effect of regional dropout, CutMix consistently outperforms the state-of-the-art augmentation strategies on CIFAR and ImageNet classification tasks, as well as on the ImageNet weakly-supervised localization task. Moreover, unlike previous augmentation methods, our CutMix-trained ImageNet classifier, when used as a pretrained model, results in consistent performance gains in Pascal detection and MS-COCO image captioning benchmarks. We also show that CutMix improves the model robustness against input corruptions and its out-of-distribution detection performances. Source code and pretrained models are available at https://github.com/clovaai/CutMix-PyTorch .

Authors (6)

Sangdoo Yun (71 papers)
Dongyoon Han (50 papers)
Seong Joon Oh (60 papers)
Sanghyuk Chun (49 papers)
Junsuk Choe (20 papers)
YoungJoon Yoo (31 papers)

Citations (4,308)

View on Semantic Scholar

Summary

The paper introduces CutMix, a strategy that mixes image patches to retain informative pixels and improve model localization.
It achieves significant performance gains on datasets like CIFAR and ImageNet, reducing top-1 error rates compared to prior methods.
CutMix also enhances robustness and transfer learning, with improvements observed in object detection and image captioning tasks.

CutMix: A Regularization Strategy to Train Classifiers with Localizable Features

The paper "CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features" by Sangdoo Yun et al. introduces an innovative data augmentation method aimed at improving both the generalization and localization capabilities of convolutional neural network (CNN) classifiers. This method, named CutMix, combines the strengths of existing strategies such as Mixup and Cutout while addressing their respective limitations.

Introduction

CNNs have demonstrated substantial efficacy in a variety of computer vision tasks including image classification, object detection, and semantic segmentation. Traditional data augmentation methods like Cutout and Mixup have shown promise in enhancing model generalization by either erasing random image regions or interpolating between images, respectively. However, Cutout often leads to a significant reduction in useful training pixels, while Mixup tends to generate unnatural images that might confuse the model.

The CutMix Strategy

CutMix operates by cutting and pasting patches among training images. Specifically, a rectangular patch is extracted from one image and pasted onto another, with the corresponding labels being mixed in proportion to the area of the patches. This technique ensures that all training pixels remain informative, unlike Cutout which zeroes out regions of images.

Formally, given two training samples $(x_A, y_A)$ and $(x_B, y_B)$ , the new training sample $(\Tilde{x},\Tilde{y})$ is generated as follows:

$\begin{split} \Tilde{x} & = \mathbf{M} \odot x_A + (\mathbf{1} - \mathbf{M}) \odot x_B \ \Tilde{y} & = \lambda y_A + (1 - \lambda) y_B, \end{split}$

where $\mathbf{M}$ is a binary mask, $\mathbf{1}$ is a binary mask filled with ones, and $\lambda$ is sampled from the beta distribution $\text{Beta}(\alpha, \alpha)$ , with $\alpha = 1$ in their experiments. The binary mask $\mathbf{M}$ determines the region to be replaced and is defined by the coordinates of a bounding box.

Experimental Evaluation

The efficacy of the CutMix strategy is validated across various datasets and tasks. Key findings from the experiments are summarized below:

CIFAR and ImageNet Classification

CIFAR-100: CutMix significantly outperforms other augmentation methods, achieving a state-of-the-art top-1 error of 14.47% with a PyramidNet-200 model.
ImageNet: Applying CutMix to ResNet-50 and ResNet-101 models yielded top-1 error reductions of +2.28% and +1.70%, respectively, over the baseline.

Weakly Supervised Object Localization (WSOL)

CutMix demonstrated superior performance over Mixup and Cutout in weakly supervised object localization tasks, evaluated on CUB200-2011 and ImageNet datasets. Notably, CutMix achieved significant enhancements in localization accuracy, making it comparable to specialized state-of-the-art WSOL methods.

Transfer Learning

The paper further explores the impact of CutMix in transfer learning scenarios. When using CutMix-pretrained ImageNet models as initialization for downstream tasks such as object detection (Pascal VOC) and image captioning (MS-COCO), substantial improvements were observed:

Object Detection: Models pretrained with CutMix showed consistent gains when fine-tuned with SSD and Faster R-CNN detectors, achieving +1 mAP improvement in Pascal VOC.
Image Captioning: CutMix-pretrained models improved BLEU scores by +2 on the MS-COCO dataset.

Robustness and Uncertainty

CutMix also enhances model robustness against input corruptions and improves out-of-distribution (OOD) detection performance. For example, the CutMix-trained models were more robust to adversarial attacks generated using Fast Gradient Sign Method (FGSM) and various forms of occlusion. Moreover, CutMix significantly alleviated the over-confidence issues noted in other augmentation methods.

Discussion and Implications

CutMix proves to be a highly effective and computationally efficient data augmentation strategy. Its ability to improve both classification accuracy and localization performance while maintaining robustness against adversarial attacks makes it a compelling choice for training CNNs. By retaining all informative pixels during training, CutMix addresses the conceptual limitations inherent in existing techniques like Cutout.

Looking ahead, the implications of CutMix are manifold. It can help in developing more resilient and accurate models for real-world applications where data variability and model robustness are critical. Future work could explore extensions of CutMix to other domains and tasks within machine learning, including natural language processing and applied machine learning scenarios such as medical imaging where labeled data might be scarce.

In conclusion, CutMix is a versatile and powerful addition to the suite of data augmentation strategies, fostering the development of robust, generalizable, and highly accurate models across a range of vision tasks.

PDF Markdown

Related Papers

GitHub

GitHub - clovaai/CutMix-PyTorch: Official Pytorch implementation of CutMix regularizer (1,199 stars)

Tweets

https://twitter.com/maxencelsb/status/1901224258044338416

YouTube

Show All Videos