Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Data Augmentation Revisited: Rethinking the Distribution Gap between Clean and Augmented Data (1909.09148v2)

Published 19 Sep 2019 in cs.LG, cs.CV, and stat.ML

Abstract: Data augmentation has been widely applied as an effective methodology to improve generalization in particular when training deep neural networks. Recently, researchers proposed a few intensive data augmentation techniques, which indeed improved accuracy, yet we notice that these methods augment data have also caused a considerable gap between clean and augmented data. In this paper, we revisit this problem from an analytical perspective, for which we estimate the upper-bound of expected risk using two terms, namely, empirical risk and generalization error, respectively. We develop an understanding of data augmentation as regularization, which highlights the major features. As a result, data augmentation significantly reduces the generalization error, but meanwhile leads to a slightly higher empirical risk. On the assumption that data augmentation helps models converge to a better region, the model can benefit from a lower empirical risk achieved by a simple method, i.e., using less-augmented data to refine the model trained on fully-augmented data. Our approach achieves consistent accuracy gain on a few standard image classification benchmarks, and the gain transfers to object detection.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zhuoxun He (1 paper)
  2. Lingxi Xie (137 papers)
  3. Xin Chen (457 papers)
  4. Ya Zhang (222 papers)
  5. Yanfeng Wang (211 papers)
  6. Qi Tian (314 papers)
Citations (61)

Summary

We haven't generated a summary for this paper yet.