Attacks Which Do Not Kill Training Make Adversarial Learning Stronger (2002.11242v2)

Published 26 Feb 2020 in cs.LG and stat.ML

Abstract: Adversarial training based on the minimax formulation is necessary for obtaining adversarial robustness of trained models. However, it is conservative or even pessimistic so that it sometimes hurts the natural generalization. In this paper, we raise a fundamental question---do we have to trade off natural generalization for adversarial robustness? We argue that adversarial training is to employ confident adversarial data for updating the current model. We propose a novel approach of friendly adversarial training (FAT): rather than employing most adversarial data maximizing the loss, we search for least adversarial (i.e., friendly adversarial) data minimizing the loss, among the adversarial data that are confidently misclassified. Our novel formulation is easy to implement by just stopping the most adversarial data searching algorithms such as PGD (projected gradient descent) early, which we call early-stopped PGD. Theoretically, FAT is justified by an upper bound of the adversarial risk. Empirically, early-stopped PGD allows us to answer the earlier question negatively---adversarial robustness can indeed be achieved without compromising the natural generalization.

Citations (378)

View on Semantic Scholar

Summary

The paper introduces Friendly Adversarial Training (FAT) that alters traditional PGD by early-stopping on confidently misclassified examples, boosting natural generalization.
It provides a tightened theoretical upper bound on adversarial risk, guiding models to better balance robustness and natural accuracy.
Experimental results on CIFAR-10 and SVHN show that FAT enhances computational efficiency while maintaining or improving defense against strong adversarial attacks.

An Expert Overview of "Attacks Which Do Not Kill Training Make Adversarial Learning Stronger"

The paper "Attacks Which Do Not Kill Training Make Adversarial Learning Stronger" by Jingfeng Zhang et al. investigates a critical issue in adversarial learning: the interplay between adversarial robustness and natural generalization. The authors challenge the traditional minimax formulation used in adversarial training, which can be overly conservative and may hinder natural generalization. They introduce a novel approach called Friendly Adversarial Training (FAT), aiming to maintain or even enhance natural generalization without sacrificing adversarial robustness.

Core Contributions

Introduction of Friendly Adversarial Training (FAT): FAT searches for adversarial data that minimize the loss among misclassified examples, in contrast to maximizing the loss, which is typical in traditional adversarial training frameworks. This strategy involves early stopping techniques in PGD (projected gradient descent) iterations to identify adversarial examples that are confidently misclassified.
The Theoretical Justification of FAT: The authors derive a theoretical upper bound for the adversarial risk that significantly tightens the gap compared to traditional methods. This encompasses correctly predicted adversarial data maximizing the loss and misclassified examples minimizing the loss by a margin. Such theorization guides networks to adjust their decision boundaries efficiently, reducing the adversarial effect while preserving or enhancing generalization on natural examples.
Computational Efficiency: FAT is computationally advantageous as it typically requires fewer backward propagations in generating adversarial examples due to the early stopping mechanism. This improved efficiency doesn't compromise robustness, an essential consideration for safety-critical applications.
Implications for Curriculum Learning: The process to progressively increase the complexity of adversarial examples (i.e., allowing for incrementally more steps in PGD as training progresses) aligns FAT with principles of curriculum learning, fostering networks that are robust to gradually more sophisticated adversarial attacks.

Experimental Evaluation

The authors conduct a robust set of experiments demonstrating that FAT can indeed improve natural accuracy while maintaining competitive adversarial robustness across various models and datasets. Specifically, results on CIFAR-10 and SVHN datasets reveal that networks trained with FAT not only outshine traditional adversarial training in terms of natural accuracy but also withstand stronger attacks without degradation in performance.

Implications and Future Directions:

Theoretical Advancements: The refinement of adversarial risk bounds and the adoption of alternative formulation strategies in FAT offer paths for further theoretical exploration of adversarial robustness and generalization.
Enhanced Defensive Capabilities: FAT's ability to incorporate larger perturbation bounds during training suggests it can sustain defensive strength in dynamic threat landscapes—ideal for applications in autonomous systems and secure software deployments.
Adoption in Multi-Objective Scenarios: The integration of friendly adversarial strategies into existing frameworks (e.g., TRADES, MART) emphasizes FAT’s versatility, making it a potential candidate for enhancing models designed for simultaneous optimization of multiple objectives.
Exploration of Early Stopping Techniques in Other Contexts: By demonstrating the benefits of early-stopped PGD for adversarial settings, there's potential to explore similar stopping strategies in other machine learning contexts, balancing compute efficiency with model performance.

In conclusion, the paper convincingly argues that through adjustments in adversarial training strategies, one can indeed achieve adversarial robustness alongside improved natural generalization. This insight propels the field closer to resolving a quintessential conundrum in adversarial machine learning. The theoretical and practical advancements posited by the authors set the stage for further exploration and application across a variety of AI contexts.

PDF Markdown

Related Papers

YouTube

Show All Videos