- The paper introduces Robust Critical Fine-Tuning (RiFT) to enhance adversarial training while maintaining robustness, achieving a ~1.5% boost in generalization.
- It presents Module Robustness Criticality (MRC) to identify non-critical modules, enabling safe weight fine-tuning via linear interpolation.
- Experimental validation on ResNet and WideResNet architectures across CIFAR10, CIFAR100, and Tiny-ImageNet demonstrates RiFT’s practical efficacy.
Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning
The paper presents a novel approach, Robustness Critical Fine-Tuning (RiFT), to improve the generalization of adversarially trained deep neural networks while maintaining adversarial robustness. Adversarial Training (AT) is a well-regarded method to increase model robustness against adversarial examples, yet it often detriments generalization abilities on in-distribution data. The authors propose leveraging redundant capacities in these models to mitigate this trade-off.
Core Contributions
- Module Robustness Criticality (MRC): The authors introduce MRC as a measure to determine each module's importance to model robustness. MRC evaluates the robustness loss increment under the worst-case weight perturbations, providing an insightful metric to identify non-critical modules that can be fine-tuned without significant robustness loss.
- Robust Critical Fine-Tuning (RiFT): Building on the MRC metric, RiFT involves fine-tuning the non-robust-critical module identified via MRC. The approach seeks to refine model weights to enhance generalization while protecting adversarial robustness, achieved through linear interpolation between the original adversarially trained weights and newly fine-tuned weights.
- Experimental Validation: RiFT's efficacy was demonstrated using ResNet18, ResNet34, and WideResNet34-10 architectures across CIFAR10, CIFAR100, and Tiny-ImageNet datasets. The results showcased an improvement in both generalization and out-of-distribution robustness by approximately 1.5%, with adversarial robustness maintained or slightly improved.
Detailed Analysis
The authors provide a thorough analysis of MRC's importance by showing that certain modules in a neural network contribute minimally to adversarial robustness. This observation opens avenues to fine-tune these modules to recover generalization capabilities lost during adversarial training. Moreover, the paper underscores that the redundant capacity for robustness is prominent in adversarially trained models, hinting at untapped potential for further performance enhancement.
This work also bridges the gap between theoretical insights and practical implementations by providing a comprehensive algorithm to calculate MRC and effectively apply RiFT. The strategy highlights that non-robust-critical modules can serve as flexibility points to improve learning without sacrificing security.
Implications and Future Directions
While RiFT demonstrates empirical success, it shifts the conventional perspective on adversarial training. The existence of non-robust-critical modules suggests current AT methods may underutilize the full capacity of DNNs. Therefore, future work should examine more efficient AT algorithms that leverage this redundancy for optimal balance between generalization and robustness.
Moreover, as the results challenge preconceived notions regarding the dichotomy between robust and generalizable features, the theoretical foundation of these findings warrants deeper exploration. This could lead to innovative architectures or training regimes that inherently balance these often opposing objectives.
Conclusion
This paper presents a compelling case for enhancing generalization in adversarially trained models by capitalizing on their redundant capacities. The introduction of MRC and the application of RiFT mark a substantive step forward in the field, encouraging more nuanced approaches to neural network fine-tuning that preserve robustness while recovering generalization deficits. The findings promise to influence both theoretical perspectives and practical approaches in the discipline, making RiFT a valuable contribution to ongoing research in adversarial machine learning.