Bag of Tricks for Adversarial Training (2010.00467v3)

Published 1 Oct 2020 in cs.LG, cs.CV, and stat.ML

Abstract: Adversarial training (AT) is one of the most effective strategies for promoting model robustness. However, recent benchmarks show that most of the proposed improvements on AT are less effective than simply early stopping the training procedure. This counter-intuitive fact motivates us to investigate the implementation details of tens of AT methods. Surprisingly, we find that the basic settings (e.g., weight decay, training schedule, etc.) used in these methods are highly inconsistent. In this work, we provide comprehensive evaluations on CIFAR-10, focusing on the effects of mostly overlooked training tricks and hyperparameters for adversarially trained models. Our empirical observations suggest that adversarial robustness is much more sensitive to some basic training settings than we thought. For example, a slightly different value of weight decay can reduce the model robust accuracy by more than 7%, which is probable to override the potential promotion induced by the proposed methods. We conclude a baseline training setting and re-implement previous defenses to achieve new state-of-the-art results. These facts also appeal to more concerns on the overlooked confounders when benchmarking defenses.

Citations (247)

View on Semantic Scholar

Summary

The paper shows that modest changes in hyperparameters like weight decay can reduce robust accuracy by over 7%.
The paper establishes a baseline PGD-AT protocol on CIFAR-10 that outperforms more complex defenses through improved clean and adversarial performance.
The paper demonstrates that standardizing hyperparameters benefits various adversarial frameworks, prompting a reevaluation of advances in adversarial training.

Insights into Adversarial Training: A Detailed Examination of Practical Nuances

The paper "Bag of Tricks for Adversarial Training" presents a comprehensive analysis of adversarial training (AT) methods and the often-overlooked implementation details that significantly affect the robustness of deep learning models against adversarial attacks. The authors, Pang et al., set out to demystify the persistent issue observed in recent benchmarks where improvements in adversarial training methods do not translate as expected when compared to simply employing early stopping strategies. This has led the authors to scrutinize a multitude of AT techniques, exploring the basics which are often disregarded or inconsistently applied, such as hyperparameter settings and training schedules.

Key Observations and Numerical Results

Conducted on the CIFAR-10 dataset, Pang et al.'s evaluations unveil several critical insights. Notably, small adjustments in weight decay can disproportionately impact model accuracy, diminishing robustness by over 7% in some scenarios, overshadowing the enhancements proposed by new methodologies. Furthermore, the authors observe that engaging a set of standardized hyperparameters can achieve state-of-the-art results, outperforming bespoke defenses previously deemed to be more advanced.

Valuable contributions from this paper include:

Hyperparameter Sensitivity: The basic settings of adversarially trained models, such as weight decay, learning rate schedules, and batch normalization modes, profoundly influence robustness. The authors highlight a 5% differential in robust accuracy due solely to inconsistencies in hyperparameters across studies.
Baseline Training Protocol: For PGD-AT on CIFAR-10, a specified baseline configuration including moderate label smoothing and employing the evaluation mode for batch normalization during adversarial example creation has yielded significant results. This leads to better clean and adversarial accuracies when compared to previously published defenses.
Extended Impact on Adversarial Frameworks: Not limited to the PGD-AT framework, the paper’s observations were consistent across other frameworks like TRADES, FastAT, and FreeAT, ensuring the conclusions drew on a broad spectrum of adversarial models.

Theoretical and Practical Implications

The implications of this paper span both theoretical and practical domains. Theoretically, it prompts reevaluation of previously reported advancements in AT, urging for standardization in baseline settings before attributing improvements to novel techniques. Practically, the results endorse the adoption of a baseline training setup that mitigates confounding variables in benchmarks, allowing fair and reliable evaluation of newly proposed methods.

The paper also calls attention to the importance of weight decay as a parameter not just for static models but as a dynamic element vital for the model's adaptation in adversarial training. This has broad implications for the development of more sophisticated, yet stable, training regimes.

Future Directions in AI

Pang et al.'s research encourages further examination of cross-dataset and cross-domain applicability of the identified tricks. There's significant potential in exploring adaptive or automated hyperparameter tuning to optimally balance clean and robust performance across diverse conditions. The community is urged to prioritize the reevaluation of defensive strategies with rigorously defined and transparent benchmarks, potentially leading to more generalized and transferable adversarial training solutions across machine learning models.

Overall, this paper underscores the critical nature of methodological rigor and reproducibility in research on adversarial robustness, setting a standard for future studies striving to enhance the security and reliability of machine learning systems.

PDF Markdown