Adversarial Robustness through Local Linearization (1907.02610v2)

Published 4 Jul 2019 in stat.ML and cs.LG

Abstract: Adversarial training is an effective methodology for training deep neural networks that are robust against adversarial, norm-bounded perturbations. However, the computational cost of adversarial training grows prohibitively as the size of the model and number of input dimensions increase. Further, training against less expensive and therefore weaker adversaries produces models that are robust against weak attacks but break down under attacks that are stronger. This is often attributed to the phenomenon of gradient obfuscation; such models have a highly non-linear loss surface in the vicinity of training examples, making it hard for gradient-based attacks to succeed even though adversarial examples still exist. In this work, we introduce a novel regularizer that encourages the loss to behave linearly in the vicinity of the training data, thereby penalizing gradient obfuscation while encouraging robustness. We show via extensive experiments on CIFAR-10 and ImageNet, that models trained with our regularizer avoid gradient obfuscation and can be trained significantly faster than adversarial training. Using this regularizer, we exceed current state of the art and achieve 47% adversarial accuracy for ImageNet with l-infinity adversarial perturbations of radius 4/255 under an untargeted, strong, white-box attack. Additionally, we match state of the art results for CIFAR-10 at 8/255.

PDF Abstract

Adversarial Robustness through Local Linearization: A Formal Overview

The paper "Adversarial Robustness through Local Linearization" presents an innovative approach to addressing adversarial robustness in neural network training, leveraging the concept of local linearization to counteract issues related to gradient obfuscation. The authors of this paper propose a novel regularizer termed the Local Linearity Regularizer (LLR), which seeks to linearize the loss landscape near training data points, thereby mitigating the effects of gradient obfuscation and enhancing model robustness against adversarial perturbations.

The primary motivation for this research arises from the computational challenges and limitations associated with adversarial training, particularly for large models and high-dimensional input spaces. The standard adversarial training frameworks often resort to using weak adversaries for computational efficiency, leading to networks that are susceptible to strong adversarial attacks, primarily due to non-linear loss surfaces—a phenomenon known as gradient obfuscation.

Key Contributions

The paper presents several key contributions:

Introduction of the Local Linearity Regularizer (LLR): The LLR is introduced as a method to enforce local linearity within a specified epsilon neighborhood of the input, effectively penalizing non-linear perturbations and advocating for robustness against adversarial attacks.
Empirical Results on CIFAR-10 and ImageNet: The authors conduct extensive experiments on widely-recognized benchmarks, demonstrating that LLR-trained models significantly reduce computation time compared to adversarial training and achieve state-of-the-art performance. For ImageNet, the LLR method achieved an impressive 47% adversarial accuracy under strong white-box attacks, with an adversarial perturbation of $\epsilon = 4/255$ .
Theoretical Underpinnings: A formal proposition is provided, establishing local linearity as an upper bound for adversarial loss, supporting the theoretical superiority of LLR in enforcing robustness without the necessity of numerous adversarial steps routinely required in traditional training methodologies.

Experimental Insights

The reported results delineate a notable reduction in training time—up to fivefold—over adversarial training without compromising on adversarial robustness. This finding suggests practical benefits for deploying more robust models with reduced resource expenditure. Additionally, LLR was shown to preserve accuracy in the face of both weak and strong adversarial attacks, indicating its utility in maintaining model performance across varying degrees of attack intensities.

Implications and Future Perspectives

The implications of this research are multi-faceted. Practically, LLR provides a scalable solution to adversarial training, thereby facilitating its application in real-world scenarios where computational resources are a limiting factor. Theoretically, the introduction of a linearization-based regularizer opens avenues for further exploration into designing loss functions and architectures inherently resistant to adversarial perturbations.

Future developments could focus on integrating LLR with other existing techniques such as TRADES or adaptive methods. Furthermore, expanding the application of LLR beyond image classification to domains such as natural language processing and reinforcement learning could provide insights into its efficacy across diverse AI challenges.

Conclusion

This paper contributes significantly to the ongoing discourse on adversarial robustness in deep learning. By introducing a novel regularizer aimed at promoting local linearity, the authors offer a methodological advancement that promises both efficiency and effectiveness, potentially setting a new standard in adversarial training protocols. The balance achieved between theoretical robustness guarantees and empirical performance makes LLR a valuable addition to the toolbox of researchers and practitioners aiming to fortify neural networks against adversarial threats.

PDF Markdown Bookmark Chat (Pro)

Authors (9)

Chongli Qin (9 papers)
James Martens (20 papers)
Sven Gowal (37 papers)
Dilip Krishnan (36 papers)
Krishnamurthy Dvijotham (58 papers)
Alhussein Fawzi (20 papers)
Soham De (38 papers)
Robert Stanforth (18 papers)
Pushmeet Kohli (116 papers)

Citations (292)

View on Semantic Scholar