Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning (2308.02533v1)

Published 1 Aug 2023 in cs.LG and cs.CV

Abstract: Deep neural networks are susceptible to adversarial examples, posing a significant security risk in critical applications. Adversarial Training (AT) is a well-established technique to enhance adversarial robustness, but it often comes at the cost of decreased generalization ability. This paper proposes Robustness Critical Fine-Tuning (RiFT), a novel approach to enhance generalization without compromising adversarial robustness. The core idea of RiFT is to exploit the redundant capacity for robustness by fine-tuning the adversarially trained model on its non-robust-critical module. To do so, we introduce module robust criticality (MRC), a measure that evaluates the significance of a given module to model robustness under worst-case weight perturbations. Using this measure, we identify the module with the lowest MRC value as the non-robust-critical module and fine-tune its weights to obtain fine-tuned weights. Subsequently, we linearly interpolate between the adversarially trained weights and fine-tuned weights to derive the optimal fine-tuned model weights. We demonstrate the efficacy of RiFT on ResNet18, ResNet34, and WideResNet34-10 models trained on CIFAR10, CIFAR100, and Tiny-ImageNet datasets. Our experiments show that \method can significantly improve both generalization and out-of-distribution robustness by around 1.5% while maintaining or even slightly enhancing adversarial robustness. Code is available at https://github.com/microsoft/robustlearn.

Citations (14)

View on Semantic Scholar

Summary

The paper introduces Robust Critical Fine-Tuning (RiFT) to enhance adversarial training while maintaining robustness, achieving a ~1.5% boost in generalization.
It presents Module Robustness Criticality (MRC) to identify non-critical modules, enabling safe weight fine-tuning via linear interpolation.
Experimental validation on ResNet and WideResNet architectures across CIFAR10, CIFAR100, and Tiny-ImageNet demonstrates RiFT’s practical efficacy.

Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning

The paper presents a novel approach, Robustness Critical Fine-Tuning (RiFT), to improve the generalization of adversarially trained deep neural networks while maintaining adversarial robustness. Adversarial Training (AT) is a well-regarded method to increase model robustness against adversarial examples, yet it often detriments generalization abilities on in-distribution data. The authors propose leveraging redundant capacities in these models to mitigate this trade-off.

Core Contributions

Module Robustness Criticality (MRC): The authors introduce MRC as a measure to determine each module's importance to model robustness. MRC evaluates the robustness loss increment under the worst-case weight perturbations, providing an insightful metric to identify non-critical modules that can be fine-tuned without significant robustness loss.
Robust Critical Fine-Tuning (RiFT): Building on the MRC metric, RiFT involves fine-tuning the non-robust-critical module identified via MRC. The approach seeks to refine model weights to enhance generalization while protecting adversarial robustness, achieved through linear interpolation between the original adversarially trained weights and newly fine-tuned weights.
Experimental Validation: RiFT's efficacy was demonstrated using ResNet18, ResNet34, and WideResNet34-10 architectures across CIFAR10, CIFAR100, and Tiny-ImageNet datasets. The results showcased an improvement in both generalization and out-of-distribution robustness by approximately 1.5%, with adversarial robustness maintained or slightly improved.

Detailed Analysis

The authors provide a thorough analysis of MRC's importance by showing that certain modules in a neural network contribute minimally to adversarial robustness. This observation opens avenues to fine-tune these modules to recover generalization capabilities lost during adversarial training. Moreover, the paper underscores that the redundant capacity for robustness is prominent in adversarially trained models, hinting at untapped potential for further performance enhancement.

This work also bridges the gap between theoretical insights and practical implementations by providing a comprehensive algorithm to calculate MRC and effectively apply RiFT. The strategy highlights that non-robust-critical modules can serve as flexibility points to improve learning without sacrificing security.

Implications and Future Directions

While RiFT demonstrates empirical success, it shifts the conventional perspective on adversarial training. The existence of non-robust-critical modules suggests current AT methods may underutilize the full capacity of DNNs. Therefore, future work should examine more efficient AT algorithms that leverage this redundancy for optimal balance between generalization and robustness.

Moreover, as the results challenge preconceived notions regarding the dichotomy between robust and generalizable features, the theoretical foundation of these findings warrants deeper exploration. This could lead to innovative architectures or training regimes that inherently balance these often opposing objectives.

Conclusion

This paper presents a compelling case for enhancing generalization in adversarially trained models by capitalizing on their redundant capacities. The introduction of MRC and the application of RiFT mark a substantive step forward in the field, encouraging more nuanced approaches to neural network fine-tuning that preserve robustness while recovering generalization deficits. The findings promise to influence both theoretical perspectives and practical approaches in the discipline, making RiFT a valuable contribution to ongoing research in adversarial machine learning.

PDF Markdown

Related Papers

GitHub

GitHub - microsoft/robustlearn: Robust machine learning for responsible AI (436 stars)

Tweets

https://twitter.com/jd92wang/status/1692462642043183201