Learning Data Augmentation Strategies for Object Detection (1906.11172v1)

Published 26 Jun 2019 in cs.CV and cs.LG

Abstract: Data augmentation is a critical component of training deep learning models. Although data augmentation has been shown to significantly improve image classification, its potential has not been thoroughly investigated for object detection. Given the additional cost for annotating images for object detection, data augmentation may be of even greater importance for this computer vision task. In this work, we study the impact of data augmentation on object detection. We first demonstrate that data augmentation operations borrowed from image classification may be helpful for training detection models, but the improvement is limited. Thus, we investigate how learned, specialized data augmentation policies improve generalization performance for detection models. Importantly, these augmentation policies only affect training and leave a trained model unchanged during evaluation. Experiments on the COCO dataset indicate that an optimized data augmentation policy improves detection accuracy by more than +2.3 mAP, and allow a single inference model to achieve a state-of-the-art accuracy of 50.7 mAP. Importantly, the best policy found on COCO may be transferred unchanged to other detection datasets and models to improve predictive accuracy. For example, the best augmentation policy identified with COCO improves a strong baseline on PASCAL-VOC by +2.7 mAP. Our results also reveal that a learned augmentation policy is superior to state-of-the-art architecture regularization methods for object detection, even when considering strong baselines. Code for training with the learned policy is available online at https://github.com/tensorflow/tpu/tree/master/models/official/detection

PDF Abstract

Learning Data Augmentation Strategies for Object Detection

The paper "Learning Data Augmentation Strategies for Object Detection" addresses the critical role of data augmentation in enhancing the training of deep learning models, particularly in the context of object detection. Unlike image classification, object detection requires more elaborate labeling, thus emphasizing the need for efficient data augmentation to improve model performance.

Overview

The authors explore the application of learned data augmentation strategies tailored for object detection. Traditional approaches often borrow augmentation techniques from image classification, such as horizontal flipping and random translations. However, these methods offer limited improvements for detection tasks. Therefore, this paper investigates whether specialized data augmentation policies, which are learned and applied during training, can outperform conventional methods.

Methodology

The paper introduces a framework to optimize data augmentation policies through a systematic search. This framework leverages a reinforcement learning approach, using a recurrent neural network (RNN) to explore discrete combinations of augmentation operations. Three categories of operations are defined:

Color Transformations: Operations that alter color aspects without affecting bounding box positions.
Geometric Transformations: Operations that change both the image and bounding box geometry.
Bounding Box Specific Transformations: Operations that apply changes only within bounding boxes.

A learned augmentation policy is treated as a set of sub-policies, each comprising sequential image transformations characterized by probability and magnitude parameters. The COCO dataset serves as the primary testing ground, providing empirical evidence for evaluating the policy's effectiveness across various detection models and datasets.

Results

The optimized data augmentation policies demonstrate significant performance gains. On the COCO dataset, the application of learned augmentation strategies yields a notable increase of +2.3 mAP, achieving a state-of-the-art 50.7 mAP with a single RetinaNet model using a ResNet-50 backbone. Notably, these policies generalize well, improving a baseline on PASCAL-VOC by +2.7 mAP without further tuning.

Additionally, the learned policies outperform existing architecture-specific regularization methods, marking a departure from manually crafted augmentations. The augmentation strategies exhibit robustness across different datasets and contribute particularly to enhancing performance on smaller datasets and on tasks involving detection of small objects.

Implications and Future Work

The approach presented holds important implications for practical and theoretical advancements in AI:

Transferability: The ability of learned augmentation policies to transfer across models and datasets underscores their utility in diverse applications.
Regularization: The augmentation serves as an effective regularizer, potentially reducing the need for additional regularization techniques.
Efficiency: The approach demonstrates computational efficiency by learning policies from a reduced subset of data, making it feasible to scale to larger datasets and complex models.

Potential future directions include extending this framework to domains like semantic segmentation and 3D data analysis, where augmentation of labeled data poses significant challenges. As automated data augmentation continues to evolve, its integration into more sophisticated vision tasks promises enhanced model robustness and efficiency.

This paper highlights the nuanced role of data augmentation in object detection, offering a learned policy framework that can adaptively enhance model performance across different vision tasks.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Barret Zoph (38 papers)
Ekin D. Cubuk (37 papers)
Golnaz Ghiasi (20 papers)
Tsung-Yi Lin (49 papers)
Jonathon Shlens (58 papers)
Quoc V. Le (128 papers)

Citations (497)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos