Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AdaAugment: A Tuning-Free and Adaptive Approach to Enhance Data Augmentation (2405.11467v2)

Published 19 May 2024 in cs.CV

Abstract: Data augmentation (DA) is widely employed to improve the generalization performance of deep models. However, most existing DA methods use augmentation operations with random magnitudes throughout training. While this fosters diversity, it can also inevitably introduce uncontrolled variability in augmented data, which may cause misalignment with the evolving training status of the target models. Both theoretical and empirical findings suggest that this misalignment increases the risks of underfitting and overfitting. To address these limitations, we propose AdaAugment, an innovative and tuning-free Adaptive Augmentation method that utilizes reinforcement learning to dynamically adjust augmentation magnitudes for individual training samples based on real-time feedback from the target network. Specifically, AdaAugment features a dual-model architecture consisting of a policy network and a target network, which are jointly optimized to effectively adapt augmentation magnitudes. The policy network optimizes the variability within the augmented data, while the target network utilizes the adaptively augmented samples for training. Extensive experiments across benchmark datasets and deep architectures demonstrate that AdaAugment consistently outperforms other state-of-the-art DA methods in effectiveness while maintaining remarkable efficiency.

AdaAugment: Enhancing Data Augmentation with Adaptive and Tuning-Free Methods

Introduction

Data Augmentation (DA) is a technique used in the training of deep neural networks to increase the diversity of the training data by creating modified versions of existing data samples. However, most existing DA methods use random augmentation magnitudes, which can introduce uncontrolled variability and may not align with the evolving training status of the model. This misalignment can lead to underfitting during the initial stages of training and overfitting in later stages. To address these limitations, this paper presents AdaAugment, a tuning-free and adaptive DA method that dynamically adjusts augmentation magnitudes based on real-time feedback from the target network using reinforcement learning.

How AdaAugment Works

Dual-Model Architecture

AdaAugment features a dual-model architecture consisting of a policy network and a target network. The policy network determines the magnitudes of augmentation operations, while the target network utilizes these adaptively augmented samples for training. Both networks are optimized jointly, making the adaptive adjustment process more integrated and efficient.

Key Components:

  • Policy Network: Learns the policy determining augmentation magnitudes based on real-time feedback during training.
  • Target Network: Uses the adaptively augmented samples for training, providing feedback to the policy network.

Reinforcement Learning Approach

The reinforcement learning (RL) component formulates the augmentation magnitude adjustment as a Markov Decision Process (MDP). Here's a simplified breakdown:

  1. State Space (S): Considers the inherent difficulty of each sample, the current training status, and the intensity of augmentation.
  2. Action Space (A): Contains actions representing different magnitudes of augmentation, ranging from 0 (no augmentation) to 1 (maximum augmentation).
  3. Reward Function (R): Designed to balance underfitting and overfitting risks by leveraging losses from fully augmented, non-augmented, and adaptively augmented data.

Reward Function Formula:

r=λ(LfullLada)+(1λ)(LadaLnone)r = \lambda(L_{\text{full}} - L_{\text{ada}}) + (1 - \lambda)(L_{\text{ada}} - L_{\text{none}})

where LfullL_{\text{full}} is the loss of fully augmented data, LnoneL_{\text{none}} is the loss of non-augmented data, and LadaL_{\text{ada}} is the loss of adaptively augmented data.

Experimental Results

CIFAR-10 and CIFAR-100

Table 1: Test accuracy (%) on CIFAR-10/100

1
2
3
4
5
6
| Dataset  | Method        | ResNet-18    | ResNet-50   | WRN-28-10   | ShakeShake  |
|-||--|-|-|-|
| CIFAR-10 | Baseline      | 95.28 ±0.14* | 95.66±0.08* | 95.52 ±0.11*| 94.90 ±0.07*|
|          | CutMix        | 96.64 ±0.62* | 96.81±0.10* | 96.93 ±0.10*| 96.47 ±0.07 |
|          | ...           | ...          | ...         | ...         | ...         |
|          | AdaAugment    | 96.75 ±0.06  | 97.34±0.13  | 97.66 ±0.07 | 97.41 ±0.06 |

AdaAugment consistently outperforms existing state-of-the-art DA methods across different network architectures. Noteworthy improvements include a 1.47% boost for ResNet-18 and a 2.14% for WRN-28-10 on CIFAR-10.

Tiny-ImageNet

Results on Tiny-ImageNet

1
2
3
4
5
6
| Method        | ResNet-18   | ResNet-50  | WRN-50-2   | ResNext-50  |
||-|||-|
| Baseline      | 61.38±0.99  | 73.61±0.43 | 81.55±1.24 | 79.76±1.89  |
| CutMix        | 64.09±0.30  | 76.41±0.27 | 82.32±0.46 | 81.31±1.00  |
| ...           | ...         | ...        | ...        | ...         |
| AdaAugment    | 71.25±0.64  | 79.11±1.51 | 83.07±0.78 | 81.92±0.29  |

On Tiny-ImageNet, AdaAugment shows significant performance improvements, such as a 9.87% increase for ResNet-18 compared to the baseline.

Practical and Theoretical Implications

Theoretical Implications

AdaAugment introduces a paradigm shift by using adaptive magnitudes in DA, which aligns with the training status of models and mitigates risks of underfitting and overfitting. This approach can be extended to various tasks beyond image classification, such as NLP and time-series analysis.

Practical Implications

Practically, AdaAugment offers a more efficient way to implement DA without manual tuning. This can streamline the workflow for data scientists and reduce the need for extensive hyperparameter tuning. The minimal additional computational overhead (around 0.5 GPU hours) makes it feasible for real-world applications.

Future Developments

Future research could explore extending AdaAugment to other domains and tasks, further optimizing the policy network, and integrating additional types of data transformations.

Conclusion

AdaAugment offers a robust, adaptive, and tuning-free solution to enhance DA, demonstrating superior efficacy in improving model performance across various datasets and architectures. Its ability to dynamically adjust augmentation magnitudes makes it a valuable tool for achieving better generalization in deep learning models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Clutr: Curriculum learning via unsupervised task representation learning. In International Conference on Machine Learning, pp.  1361–1395. PMLR, 2023.
  2. An information-theoretic perspective on overfitting and underfitting. In Advances in Artificial Intelligence: 33rd Australasian Joint Conference, pp.  347–358. Springer, 2020.
  3. Byeon, H. Advances in value-based, policy-based, and deep learning-based reinforcement learning. International Journal of Advanced Computer Science and Applications, 14(8), 2023.
  4. Gridmask data augmentation. arXiv preprint arXiv:2001.04086, 2020.
  5. Adaaug: Learning class-and instance-adaptive data augmentation policies. In International Conference on Learning Representations, 2021.
  6. A downsampled variant of imagenet as an alternative to the cifar datasets. arXiv preprint arXiv:1707.08819, Aug 2017.
  7. Autoaugment: Learning augmentation strategies from data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  113–123, 2019.
  8. Randaugment: Practical automated data augmentation with a reduced search space. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F., and Lin, H. (eds.), Proc. Adv. Neural Inf. Process. Syst., volume 33, pp.  18613–18624, 2020.
  9. Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552, 2017.
  10. Adaptive data augmentation for image classification. In 2016 IEEE international conference on image processing (ICIP), pp.  3688–3692. Ieee, 2016.
  11. Gastaldi, X. Shake-shake regularization. 2017.
  12. Keepaugment: A simple information-preserving data augmentation approach. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  1055–1064, 2021.
  13. A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6):1291–1307, 2012.
  14. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  770–778, 2016.
  15. When to learn what: Model-adaptive data augmentation curriculum. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  1717–1728, 2023.
  16. Feature map testing for deep neural networks. arXiv preprint arXiv:2307.11563, 2023.
  17. Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 23(6):4909–4926, 2022. doi: 10.1109/TITS.2021.3054625.
  18. Collecting a large-scale dataset of fine-grained cars. 2013.
  19. Convolutional deep belief networks on cifar-10. Unpublished manuscript, 40(7):1–9, 2010.
  20. Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6):84–90, 2017.
  21. Deep reinforcement learning in computer vision: a comprehensive survey. Artificial Intelligence Review, pp.  1–87, 2022.
  22. The MNIST database of handwritten digits, 1998. URL http://yann. lecun. com/exdb/mnist, 10(34):14, 1998.
  23. Learning augmentation network via influence functions. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  10961–10970, 2020.
  24. Keeping deep learning models in check: A history-based approach to mitigate overfitting, 2024.
  25. Li, S. E. Deep reinforcement learning. In Reinforcement Learning for Sequential Decision and Optimal Control, pp.  365–402. Springer, 2023.
  26. Fast autoaugment. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., and Garnett, R. (eds.), Advances in neural information processing systems, volume 32. Curran Associates, Inc., 2019.
  27. Selectaugment: Hierarchical deterministic sample selection for data augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp.  1604–1612, 2023a.
  28. A survey on reinforcement learning for recommender systems. IEEE Transactions on Neural Networks and Learning Systems, 2023b.
  29. Data-efficient augmentation for training neural networks. Advances in Neural Information Processing Systems, 35:5124–5136, 2022.
  30. Large-scale long-tailed recognition in an open world. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  31. Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151, 2013.
  32. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pp.  1928–1937. PMLR, 2016.
  33. Trivialaugment: Tuning-free yet state-of-the-art data augmentation. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  774–782, 2021.
  34. Parallel and scalable dunn index for the validation of big data clusters. Parallel Computing, 102:102751, 2021.
  35. Automated flower classification over a large number of classes. In 2008 Sixth Indian conference on computer vision, graphics & image processing, pp.  722–729. IEEE, 2008.
  36. Cats and dogs. In 2012 IEEE conference on computer vision and pattern recognition, pp.  3498–3505. IEEE, 2012.
  37. Meta-learning requires meta-augmentation. Advances in Neural Information Processing Systems, 33:5705–5715, 2020.
  38. Undecidability of underfitting in learning algorithms. In 2021 2nd International Conference on Computing and Data Science (CDS), pp.  591–594. IEEE, 2021.
  39. Reinforcement learning algorithms: A brief survey. Expert Systems with Applications, pp.  120495, 2023.
  40. Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and action localization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  3544–3553. IEEE, Oct 2017.
  41. Curriculum learning: A survey. International Journal of Computer Vision, 130(6):1526–1565, 2022.
  42. Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 12, 1999.
  43. Suzuki, T. Teachaugment: Data augmentation optimization using teacher knowledge. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  10904–10914, 2022.
  44. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  1492–1500, 2017.
  45. A comprehensive survey of image augmentation techniques for deep learning. Pattern Recognition, 137:109347, 2023. ISSN 0031-3203. doi: https://doi.org/10.1016/j.patcog.2023.109347.
  46. Universal adaptive data augmentation. arXiv preprint arXiv:2207.06658, 2022.
  47. Image data augmentation for deep learning: A survey. arXiv preprint arXiv:2204.08610, 2022.
  48. Advmask: A sparse adversarial attack-based data augmentation method for image classification. Pattern Recognition, 144:109847, 2023.
  49. Investigating the effectiveness of data augmentation from similarity and diversity: An empirical study. Pattern Recognition, 148:110204, 2024. ISSN 0031-3203. doi: https://doi.org/10.1016/j.patcog.2023.110204.
  50. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  6023–6032, 2019.
  51. Wide residual networks. In Richard C. Wilson, E. R. H. and Smith, W. A. P. (eds.), Proc. Brit. Mach. Vis. Conf. (BMVC), pp.  87.1–87.12. BMVA Press, September 2016. ISBN 1-901725-59-6. doi: 10.5244/C.30.87. URL https://dx.doi.org/10.5244/C.30.87.
  52. A-fmi: learning attributions from deep networks via feature map importance. arXiv preprint arXiv:2104.05527, 2021.
  53. mixup: Beyond empirical risk minimization. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=r1Ddp1-Rb.
  54. Adversarial autoaugment. arXiv preprint arXiv:1912.11188, 2019.
  55. Adaptive data augmentation for contrastive learning. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.  1–5. IEEE, 2023.
  56. Random erasing data augmentation. In Proc. AAAI, volume 34, pp.  13001–13008, 2020.
  57. A comprehensive survey on transfer learning. Proc. IEEE, 109(1):43–76, 2021. doi: 10.1109/JPROC.2020.3004555.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Suorong Yang (13 papers)
  2. Peijia Li (5 papers)
  3. Xin Xiong (16 papers)
  4. Furao Shen (44 papers)
  5. Jian Zhao (218 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com