GridMask Data Augmentation (2001.04086v3)

Published 13 Jan 2020 in cs.CV

Abstract: We propose a novel data augmentation method `GridMask' in this paper. It utilizes information removal to achieve state-of-the-art results in a variety of computer vision tasks. We analyze the requirement of information dropping. Then we show limitation of existing information dropping algorithms and propose our structured method, which is simple and yet very effective. It is based on the deletion of regions of the input image. Our extensive experiments show that our method outperforms the latest AutoAugment, which is way more computationally expensive due to the use of reinforcement learning to find the best policies. On the ImageNet dataset for recognition, COCO2017 object detection, and on Cityscapes dataset for semantic segmentation, our method all notably improves performance over baselines. The extensive experiments manifest the effectiveness and generality of the new method.

Citations (283)

View on Semantic Scholar

Summary

The paper introduces GridMask, a novel data augmentation technique that uses structured removal of image regions to improve model generalization.
It employs precise parameters to create evenly distributed masked areas, optimizing the balance between information retention and elimination.
Experimental results demonstrate improved top-1 accuracy on ImageNet and increased mAP on COCO2017, showcasing its efficiency over existing methods.

An Essay on GridMask Data Augmentation

The paper "GridMask Data Augmentation" introduces a novel data augmentation technique, GridMask, that makes strategic use of information removal to improve the performance of computer vision models. The method is positioned as a superior alternative to existing augmentation methods, particularly in its ability to consistently deliver state-of-the-art results across a range of computer vision tasks, including image classification, object detection, and semantic segmentation.

Core Contributions and Methodology

GridMask's central contribution lies in its structured approach to information removal from input images. Unlike traditional methods such as Cutout or Random Erase, which remove one or more continuous blocks from an image, GridMask deletes spatially distributed square regions in a structured fashion. This procedure aims to achieve a statistically optimal balance between information retention and removal, which is crucial for bolstering a model's generalization capabilities without introducing significant noise.

The GridMask technique defines a mask using parameters $r$ , $d$ , $\delta_x$ , and $\delta_y$ , which dictate the size, distribution, and offset of the eliminated regions within each image. GridMask is easy to implement and incorporates seamlessly with any existing Convolutional Neural Network (CNN) architecture, marking a critical advantage over computationally demanding methods such as AutoAugment.

Experimental Results

A comprehensive suite of experiments validates the effectiveness and broad applicability of GridMask. On challenging datasets like ImageNet for image classification, GridMask demonstrated notable improvements in the top-1 accuracy of widely-used models such as ResNet50 (+1.4%) and ResNet152 (+1.4%). For object detection on the COCO2017 dataset, employing GridMask with the Faster-RCNN framework increased the mean Average Precision (mAP) from 37.4% to 39.2%. In the semantic segmentation task on the Cityscapes dataset, GridMask applied to the PSPNet model achieved an increase in mean Intersection over Union (mIoU) from 77.3% to 78.1%.

Comparative Evaluation

The paper positions GridMask against several contemporary data augmentation methodologies. Methods such as Cutout and HaS (Hide and Seek) focus on information dropping but treat it in separate and less structured manners, hence failing to deliver the same consistency. Meanwhile, AutoAugment, which employs reinforcement learning techniques to sample various policies, is noted for being more computationally expensive. GridMask stands out in delivering better results despite its simplicity and reduced computational requirements, echoing its practicality for real-world model training.

Implications and Future Directions

GridMask effectively challenges the conventionalities of data augmentation by leveraging structured information dropping. This method may open pathways for exploring similar structured approaches in other subfields of machine learning, encouraging a departure from randomness toward intentional removal strategies that stimulate better learning and robustness in CNN models. Additionally, GridMask's simple composition hints at its potential integration with policy-search algorithms like AutoAugment, possibly resulting in hybrid strategies that unify structured data reduction with exhaustive search paradigms.

The strong empirical results endorse GridMask as a new standard for data augmentation tasks. Its impact is especially pronounced in the context of computational efficiency and effectiveness, providing a fresh perspective on the role of augmentation strategies in CNN training.

In conclusion, the GridMask method is a testament to the power of simple yet clever algorithmic innovation, promising significant implications for future research and application in deep learning and computer vision tasks.