Towards Out-of-Distribution Adversarial Robustness (2210.03150v4)
Abstract: Adversarial robustness continues to be a major challenge for deep learning. A core issue is that robustness to one type of attack often fails to transfer to other attacks. While prior work establishes a theoretical trade-off in robustness against different $L_p$ norms, we show that there is potential for improvement against many commonly used attacks by adopting a domain generalisation approach. Concretely, we treat each type of attack as a domain, and apply the Risk Extrapolation method (REx), which promotes similar levels of robustness against all training attacks. Compared to existing methods, we obtain similar or superior worst-case adversarial robustness on attacks seen during training. Moreover, we achieve superior performance on families or tunings of attacks only encountered at test time. On ensembles of attacks, our approach improves the accuracy from 3.4% with the best existing baseline to 25.9% on MNIST, and from 16.9% to 23.5% on CIFAR10.
- Square attack: a query-efficient black-box adversarial attack via random search. In European Conference on Computer Vision, pp. 484–501. Springer, 2020. URL https://arxiv.org/abs/1912.00049.
- Invariant risk minimization. arXiv:1907.02893, 2019.
- Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In International conference on machine learning, pp. 274–283. PMLR, 2018. URL https://arxiv.org/abs/1802.00420.
- Adversarial feature desensitization. Advances in Neural Information Processing Systems, 34, 2021. URL https://arxiv.org/abs/2006.04621.
- Evasion attacks against machine learning at test time. In Joint European conference on machine learning and knowledge discovery in databases, pp. 387–402. Springer, 2013. URL https://arxiv.org/abs/1708.06131.
- Accurate, reliable and fast robustness evaluation. Advances in neural information processing systems, 32, 2019. URL https://arxiv.org/abs/1907.01003.
- Towards evaluating the robustness of neural networks. In Security and Privacy (SP), 2017 IEEE Symposium on, pp. 39–57. IEEE, 2017. URL https://arxiv.org/abs/1608.04644.
- Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International conference on machine learning, pp. 2206–2216. PMLR, 2020. URL https://arxiv.org/abs/2003.01690.
- An analysis of adversarial attacks and defenses on autonomous driving models. In 2020 IEEE international conference on pervasive computing and communications (PerCom), pp. 1–10. IEEE, 2020. URL https://arxiv.org/abs/2002.02175.
- Advertorch v0. 1: An adversarial robustness toolbox based on pytorch. arXiv preprint arXiv:1902.07623, 2019. URL https://arxiv.org/abs/1902.07623.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021. URL https://arxiv.org/abs/2010.11929.
- Robust physical-world attacks on deep learning visual classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1625–1634, 2018.
- Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015. URL http://arxiv.org/abs/1412.6572.
- In search of lost domain generalization. In International Conference on Learning Representations, 2020. URL https://arxiv.org/abs/2007.01434.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016. URL https://arxiv.org/abs/1512.03385.
- Benchmarking neural network robustness to common corruptions and perturbations. Proceedings of the International Conference on Learning Representations, 2019. URL https://arxiv.org/abs/1903.12261.
- The impact of non-stationarity on generalisation in deep reinforcement learning. arXiv preprint arXiv:2006.05826, 2020. URL https://arxiv.org/abs/2006.05826.pdf.
- On the geometry of adversarial examples. arXiv preprint arXiv:1811.00525, 2018. URL https://arxiv.org/abs/1811.00525.
- Detecting change in data streams. In VLDB, volume 4, pp. 180–191. Toronto, Canada, 2004.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Learning multiple layers of features from tiny images. 2009.
- Out-of-distribution generalization via risk extrapolation (rex). In International Conference on Machine Learning, pp. 5815–5826. PMLR, 2021. URL https://arxiv.org/abs/2003.00688.
- Adversarial examples in the physical world. ICLR Workshop, 2017. URL https://arxiv.org/abs/1607.02533.
- Functional adversarial attacks. Advances in neural information processing systems, 32, 2019. URL https://arxiv.org/abs/1906.00001.
- Perceptual adversarial robustness: Defense against unseen threat models. In International Conference on Learning Representations, 2021. URL https://arxiv.org/abs/2006.12655.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
- Practical adversarial attacks against speaker recognition systems. In Proceedings of the 21st international workshop on mobile computing systems and applications, pp. 9–14, 2020. URL https://par.nsf.gov/servlets/purl/10193609.
- Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rJzIBfZAb.
- Adversarial robustness against the union of multiple perturbation models. In International Conference on Machine Learning, pp. 6640–6650. PMLR, 2020. URL https://arxiv.org/abs/1909.04068.
- Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2574–2582, 2016. URL https://arxiv.org/abs/1511.04599.
- Reading digits in natural images with unsupervised feature learning. 2011. URL http://ufldl.stanford.edu/housenumbers/nips2011_housenumbers.pdf.
- Bag of tricks for adversarial training. In International Conference on Learning Representations, 2020. URL https://arxiv.org/abs/2010.00467.pdf.
- Distillation as a defense to adversarial perturbations against deep neural networks. In Security and Privacy (SP), 2016 IEEE Symposium on, pp. 582–597. IEEE, 2016. URL https://arxiv.org/abs/1511.04508.
- Secure and robust machine learning for healthcare: A survey. IEEE Reviews in Biomedical Engineering, 14:156–180, 2020. URL https://arxiv.org/abs/2001.08103.
- Overfitting in adversarially robust deep learning. In International Conference on Machine Learning, pp. 8093–8104. PMLR, 2020. URL https://arxiv.org/abs/2002.11569.pdf.
- Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization. arXiv:1911.08731, 2019. URL https://arxiv.org/abs/1911.08731.
- Do adversarially robust imagenet models transfer better? Advances in Neural Information Processing Systems, 33:3533–3545, 2020. URL https://arxiv.org/abs/2007.08489.
- Revisiting weakly supervised pre-training of visual perception models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 804–814, 2022. URL https://arxiv.org/abs/2201.08371.
- Improving the generalization of adversarial training with domain adaptation. arXiv preprint arXiv:1810.00740, 2018. URL https://arxiv.org/abs/1810.00740.
- Intriguing properties of neural networks. In International Conference on Learning Representations, 2014. URL http://arxiv.org/abs/1312.6199.
- Adversarial training and robustness for multiple perturbations. arXiv preprint arXiv:1904.13000, 2019. URL https://arxiv.org/abs/1904.13000.
- Adversarially-trained deep nets transfer better: Illustration on image classification. In International Conference on Learning Representations, 2021. URL https://arxiv.org/abs/2007.05869.
- Generalizing to unseen domains: A survey on domain generalization. arXiv preprint arXiv:2103.03097, 2021. URL https://arxiv.org/abs/2103.03097.
- Spatially transformed adversarial examples. In International Conference on Learning Representations, 2018. URL https://arxiv.org/abs/1801.02612.
- Adam Ibrahim (12 papers)
- Charles Guille-Escuret (10 papers)
- Ioannis Mitliagkas (61 papers)
- Irina Rish (85 papers)
- David Krueger (75 papers)
- Pouya Bashivan (15 papers)