AdaAugment: Enhancing Data Augmentation with Adaptive and Tuning-Free Methods
Introduction
Data Augmentation (DA) is a technique used in the training of deep neural networks to increase the diversity of the training data by creating modified versions of existing data samples. However, most existing DA methods use random augmentation magnitudes, which can introduce uncontrolled variability and may not align with the evolving training status of the model. This misalignment can lead to underfitting during the initial stages of training and overfitting in later stages. To address these limitations, this paper presents AdaAugment, a tuning-free and adaptive DA method that dynamically adjusts augmentation magnitudes based on real-time feedback from the target network using reinforcement learning.
How AdaAugment Works
Dual-Model Architecture
AdaAugment features a dual-model architecture consisting of a policy network and a target network. The policy network determines the magnitudes of augmentation operations, while the target network utilizes these adaptively augmented samples for training. Both networks are optimized jointly, making the adaptive adjustment process more integrated and efficient.
Key Components:
- Policy Network: Learns the policy determining augmentation magnitudes based on real-time feedback during training.
- Target Network: Uses the adaptively augmented samples for training, providing feedback to the policy network.
Reinforcement Learning Approach
The reinforcement learning (RL) component formulates the augmentation magnitude adjustment as a Markov Decision Process (MDP). Here's a simplified breakdown:
- State Space (S): Considers the inherent difficulty of each sample, the current training status, and the intensity of augmentation.
- Action Space (A): Contains actions representing different magnitudes of augmentation, ranging from 0 (no augmentation) to 1 (maximum augmentation).
- Reward Function (R): Designed to balance underfitting and overfitting risks by leveraging losses from fully augmented, non-augmented, and adaptively augmented data.
Reward Function Formula:
where is the loss of fully augmented data, is the loss of non-augmented data, and is the loss of adaptively augmented data.
Experimental Results
CIFAR-10 and CIFAR-100
Table 1: Test accuracy (%) on CIFAR-10/100
1 2 3 4 5 6 |
| Dataset | Method | ResNet-18 | ResNet-50 | WRN-28-10 | ShakeShake | |-||--|-|-|-| | CIFAR-10 | Baseline | 95.28 ±0.14* | 95.66±0.08* | 95.52 ±0.11*| 94.90 ±0.07*| | | CutMix | 96.64 ±0.62* | 96.81±0.10* | 96.93 ±0.10*| 96.47 ±0.07 | | | ... | ... | ... | ... | ... | | | AdaAugment | 96.75 ±0.06 | 97.34±0.13 | 97.66 ±0.07 | 97.41 ±0.06 | |
AdaAugment consistently outperforms existing state-of-the-art DA methods across different network architectures. Noteworthy improvements include a 1.47% boost for ResNet-18 and a 2.14% for WRN-28-10 on CIFAR-10.
Tiny-ImageNet
Results on Tiny-ImageNet
1 2 3 4 5 6 |
| Method | ResNet-18 | ResNet-50 | WRN-50-2 | ResNext-50 | ||-|||-| | Baseline | 61.38±0.99 | 73.61±0.43 | 81.55±1.24 | 79.76±1.89 | | CutMix | 64.09±0.30 | 76.41±0.27 | 82.32±0.46 | 81.31±1.00 | | ... | ... | ... | ... | ... | | AdaAugment | 71.25±0.64 | 79.11±1.51 | 83.07±0.78 | 81.92±0.29 | |
On Tiny-ImageNet, AdaAugment shows significant performance improvements, such as a 9.87% increase for ResNet-18 compared to the baseline.
Practical and Theoretical Implications
Theoretical Implications
AdaAugment introduces a paradigm shift by using adaptive magnitudes in DA, which aligns with the training status of models and mitigates risks of underfitting and overfitting. This approach can be extended to various tasks beyond image classification, such as NLP and time-series analysis.
Practical Implications
Practically, AdaAugment offers a more efficient way to implement DA without manual tuning. This can streamline the workflow for data scientists and reduce the need for extensive hyperparameter tuning. The minimal additional computational overhead (around 0.5 GPU hours) makes it feasible for real-world applications.
Future Developments
Future research could explore extending AdaAugment to other domains and tasks, further optimizing the policy network, and integrating additional types of data transformations.
Conclusion
AdaAugment offers a robust, adaptive, and tuning-free solution to enhance DA, demonstrating superior efficacy in improving model performance across various datasets and architectures. Its ability to dynamically adjust augmentation magnitudes makes it a valuable tool for achieving better generalization in deep learning models.