Stochastic Activation Pruning for Robust Adversarial Defense (1803.01442v1)

Published 5 Mar 2018 in cs.LG and stat.ML

Abstract: Neural networks are known to be vulnerable to adversarial examples. Carefully chosen perturbations to real images, while imperceptible to humans, induce misclassification and threaten the reliability of deep learning systems in the wild. To guard against adversarial examples, we take inspiration from game theory and cast the problem as a minimax zero-sum game between the adversary and the model. In general, for such games, the optimal strategy for both players requires a stochastic policy, also known as a mixed strategy. In this light, we propose Stochastic Activation Pruning (SAP), a mixed strategy for adversarial defense. SAP prunes a random subset of activations (preferentially pruning those with smaller magnitude) and scales up the survivors to compensate. We can apply SAP to pretrained networks, including adversarially trained models, without fine-tuning, providing robustness against adversarial examples. Experiments demonstrate that SAP confers robustness against attacks, increasing accuracy and preserving calibration.

Citations (533)

View on Semantic Scholar

Summary

The paper demonstrates that SAP introduces stochasticity by selectively pruning activations to enhance robustness without retraining the model.
It frames adversarial defense as a minimax zero‐sum game, employing a multinomial distribution for activation scaling to counter perturbations.
Experimental results show SAP’s effectiveness in improving CIFAR-10 classification and reinforcement learning performance compared to dropout and noise methods.

Stochastic Activation Pruning for Robust Adversarial Defense

The paper explores the vulnerability of neural networks towards adversarial examples and presents a method called Stochastic Activation Pruning (SAP) to enhance robustness against such attacks. It approaches adversarial defense through a game-theoretic lens, positing the interaction between an adversary and the model as a minimax zero-sum game.

Core Methodology

SAP centers on introducing stochasticity in neural networks through selective pruning of activations. By randomly removing a subset of activations and scaling the remaining ones, SAP forms a stochastic defense strategy. This approach does not necessitate any retraining or fine-tuning of the original model—an advantage over existing adversarial training methods.

Theoretical Considerations

The authors frame the defense against adversarial attacks as a strategic game where optimal strategies likely require mixed (stochastic) policies. In this setup, the method operates by transforming the neural network’s activation map into a multinomial distribution, allowing greater flexibility and resilience against perturbations.

Experimental Evaluation

Image Classification: On the CIFAR-10 dataset, SAP demonstrates improved robustness compared to dense and dropout models when subjected to FGSM attacks. SAP, particularly at a 100% sampling rate, achieves significant accuracy improvements at modest perturbation levels (e.g., 12.2% accuracy increase at λ=1).
Reinforcement Learning: SAP also extends its advantages to reinforcement learning domains, with compelling improvements in robustness as evidenced in various Atari games, displaying a marked increase in reward under adversarial conditions.

Comparative Analysis

The paper contrasts SAP with several stochastic approaches, including Gaussian noise addition and weight pruning methods. These alternatives either matched or underperformed compared to SAP, highlighting SAP's unique benefits. Notably, SAP's compatibility with adversarially trained models demonstrates additive gains, showing promise for integration with standard defense mechanisms.

Implications and Future Prospects

SAP’s ability to be applied to any pretrained model without further modification makes it a versatile tool for improving model robustness. Its stochastic approach introduces a novel dimension in defensive strategies, opening avenues for further exploration on how stochasticity can be leveraged in more complex network architectures or under varying adversarial models.

In practice, the operational simplicity and effectiveness of SAP could prove valuable in security-sensitive applications, where real-time or post-deployment adjustments are necessary. Future work may involve refining this stochastic mechanism, exploring different activation scaling strategies, and integrating SAP with other powerful adversarial training techniques to bolster defenses.

PDF Markdown