Geometry-aware Instance-reweighted Adversarial Training (2010.01736v2)

Published 5 Oct 2020 in cs.LG and cs.AI

Abstract: In adversarial machine learning, there was a common belief that robustness and accuracy hurt each other. The belief was challenged by recent studies where we can maintain the robustness and improve the accuracy. However, the other direction, whether we can keep the accuracy while improving the robustness, is conceptually and practically more interesting, since robust accuracy should be lower than standard accuracy for any model. In this paper, we show this direction is also promising. Firstly, we find even over-parameterized deep networks may still have insufficient model capacity, because adversarial training has an overwhelming smoothing effect. Secondly, given limited model capacity, we argue adversarial data should have unequal importance: geometrically speaking, a natural data point closer to/farther from the class boundary is less/more robust, and the corresponding adversarial data point should be assigned with larger/smaller weight. Finally, to implement the idea, we propose geometry-aware instance-reweighted adversarial training, where the weights are based on how difficult it is to attack a natural data point. Experiments show that our proposal boosts the robustness of standard adversarial training; combining two directions, we improve both robustness and accuracy of standard adversarial training.

Citations (252)

View on Semantic Scholar

Summary

The paper presents GAIRAT, which assigns instance-specific weights based on geometric measures to improve adversarial training.
It applies projected gradient descent to assess data points' attack difficulty, prioritizing adversarial examples near decision boundaries.
GAIRAT enhances robustness on standard benchmarks without sacrificing natural data accuracy, effectively mitigating robust overfitting.

An Evaluation of Geometry-aware Instance-reweighted Adversarial Training

In the field of adversarial machine learning, a prevalent notion persists that achieving robustness often comes at the expense of accuracy, a dynamic commonly referred to as the robustness-accuracy trade-off. The paper "Geometry-aware Instance-reweighted Adversarial Training" addresses this core dilemma by presenting theoretical insights and novel methods suggesting pathways to circumvent this trade-off, allowing for the enhancement of both robustness and accuracy concurrently.

Key Contributions

The authors propose a method termed Geometry-aware Instance-reweighted Adversarial Training (GAIRAT). The approach is built on the premise that not all adversarial data should be treated equally due to insubstantial model capacity. This stance is backed by several findings:

Limited Model Capacity in Adversarial Training: Contrary to what might be assumed with over-parameterized deep networks, adversarial training requires significant capacity due to its pronounced smoothing effect. This effect demands model resources to fit the neighborhoods around natural data points, which extend significantly in high-dimensional input spaces.
Instance-specific Weighting Strategies: Adversary instances closer to decision boundaries should be weighted more in training since they are deemed more "attackable" and hence carry more value in refining model robustness. GAIRAT implements this by calculating a geometric measure—the least number of iterative attacks required to misclassify a point—and assigns weights inversely proportional to this measure.
Enhanced Robustness Without Compromise on Accuracy: By integrating weighted adversarial loss, GAIRAT markedly improves robustness against adversarial attacks while not detracting from standard accuracy on natural data—a significant departure from prior models where improvements in robustness would typically reduce accuracy.

Methodological Innovations

The technical bedrock of the paper involves altering the traditional adversarial training regime by utilizing projected gradient descent (PGD) to assess the difficulty of attacking each data point. This geometric measure allows the model to adjust the impact of each data point on the learning process adaptively.

GAIRAT's proposition of geometry-aware weights entails a substantial reconsideration of adversarial training priorities. By placing emphasis on potentially misclassified instances, GAIRAT dynamically manages model capacity usage, thus extending beyond static approaches where no such differentiation happens.

Empirical Evaluation and Results

The authors underpin their theoretical framework with extensive experiments. Using standard datasets such as CIFAR-10 and SVHN, and robust architectures like Wide ResNets and VGG variants, they illustrate that GAIRAT outperforms traditional methods like standard adversarial training (AT), friendly adversarial training (FAT), and even recent works like MART and TRADES.

Particularly notable is GAIRAT's ability to relieve robust overfitting, a pervasive issue where continuous training leads to models fitting adversarial examples excessively, thus reducing test-time robustness. The paper demonstrates robust gains across multiple architectures and hyperparameter settings, making a strong case for this approach as a viable path forward.

Implications and Future Directions

The concept of instance-reweighting in adversarial training opens a promising avenue for refining the robustness of machine learning models without degrading their standard performance. Beyond the theoretical contributions, this method holds practical importance in security-critical applications like autonomous driving and financial fraud detection, where robustness cannot afford to lag behind.

Future explorations can delve into adaptive learning frameworks where weight assignments are dynamically adjusted during learning cycles, potentially incorporating other types of adversarial threats beyond PGD. Furthermore, integrating this approach within federated learning contexts or with unlabeled external datasets can further enhance generalization capabilities.

In conclusion, GAIRAT stands as a robust method that effectively mitigates the adversarial trade-offs that have historically constrained model development, demonstrating the potential for simultaneously achieving high accuracy and robust performance.

PDF Markdown