Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Empirical Study on the Relation between Network Interpretability and Adversarial Robustness (1912.03430v6)

Published 7 Dec 2019 in cs.LG and stat.ML

Abstract: Deep neural networks (DNNs) have had many successes, but they suffer from two major issues: (1) a vulnerability to adversarial examples and (2) a tendency to elude human interpretation. Interestingly, recent empirical and theoretical evidence suggests these two seemingly disparate issues are actually connected. In particular, robust models tend to provide more interpretable gradients than non-robust models. However, whether this relationship works in the opposite direction remains obscure. With this paper, we seek empirical answers to the following question: can models acquire adversarial robustness when they are trained to have interpretable gradients? We introduce a theoretically inspired technique called Interpretation Regularization (IR), which encourages a model's gradients to (1) match the direction of interpretable target salience maps and (2) have small magnitude. To assess model performance and tease apart factors that contribute to adversarial robustness, we conduct extensive experiments on MNIST and CIFAR-10 with both $\ell_2$ and $\ell_\infty$ attacks. We demonstrate that training the networks to have interpretable gradients improves their robustness to adversarial perturbations. Applying the network interpretation technique SmoothGrad yields additional performance gains, especially in cross-norm attacks and under heavy perturbations. The results indicate that the interpretability of the model gradients is a crucial factor for adversarial robustness. Code for the experiments can be found at https://github.com/a1noack/interp_regularization.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Adam Noack (4 papers)
  2. Isaac Ahern (2 papers)
  3. Dejing Dou (112 papers)
  4. Boyang Li (106 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.