Scaling provable adversarial defenses (1805.12514v2)

Published 31 May 2018 in cs.LG, cs.AI, math.OC, and stat.ML

Abstract: Recent work has developed methods for learning deep network classifiers that are provably robust to norm-bounded adversarial perturbation; however, these methods are currently only possible for relatively small feedforward networks. In this paper, in an effort to scale these approaches to substantially larger models, we extend previous work in three main directions. First, we present a technique for extending these training procedures to much more general networks, with skip connections (such as ResNets) and general nonlinearities; the approach is fully modular, and can be implemented automatically (analogous to automatic differentiation). Second, in the specific case of $\ell_\infty$ adversarial perturbations and networks with ReLU nonlinearities, we adopt a nonlinear random projection for training, which scales linearly in the number of hidden units (previous approaches scaled quadratically). Third, we show how to further improve robust error through cascade models. On both MNIST and CIFAR data sets, we train classifiers that improve substantially on the state of the art in provable robust adversarial error bounds: from 5.8% to 3.1% on MNIST (with $\ell_\infty$ perturbations of $\epsilon=0.1$), and from 80% to 36.4% on CIFAR (with $\ell_\infty$ perturbations of $\epsilon=2/255$). Code for all experiments in the paper is available at https://github.com/locuslab/convex_adversarial/.

Authors (4)

Eric Wong (47 papers)
Frank R. Schmidt (10 papers)
Jan Hendrik Metzen (31 papers)
J. Zico Kolter (151 papers)

Citations (440)

View on Semantic Scholar

Summary

The paper introduces a novel modular dual function approach that extends adversarial defenses to complex architectures like ResNets.
It proposes a nonlinear random projection method that reduces computational complexity from quadratic to linear for ℓ∞ perturbations.
Experimental results on MNIST and CIFAR-10 demonstrate significant robust error reductions, highlighting the practical scalability of the approach.

Scaling Provable Adversarial Defenses

The paper "Scaling Provable Adversarial Defenses" explores advancements in creating deep neural network classifiers that are provably robust against norm-bounded adversarial perturbations. The authors focus on expanding these methods to accommodate larger and more complex models, addressing a significant limitation observed in prior research.

Core Contributions

The paper makes substantial progress in scaling adversarial defenses by introducing three key innovations:

Generalized Network Support: The authors extend adversarial training procedures to networks with skip connections, such as ResNets, and support general nonlinearities. This approach utilizes modular dual functions, simplifying the derivation of robust bounds for diverse architectures, akin to automatic differentiation.
Computational Efficiency: For networks with ReLU nonlinearities under $\ell_\infty$ adversarial perturbations, the authors propose a nonlinear random projection technique. This innovation reduces the computational complexity from quadratic to linear relative to the number of hidden units, thereby enabling more efficient training without compromising performance.
Improved Robust Error: Adopting a strategy involving cascade models, the research demonstrates that robust error can be further reduced. Although this improvement comes at the expense of increased non-robust error, it signifies a meaningful advancement in achieving lower verified robust errors.

Experimental Results

The paper presents experiments conducted on both MNIST and CIFAR-10 datasets. Noteworthy improvements in provable robust adversarial error bounds are reported:

MNIST: Robust error decreased significantly from 5.8% to 3.1% for $\ell_\infty$ perturbations with $\epsilon=0.1$ .
CIFAR-10: A striking drop from 80% to 36.4% for $\ell_\infty$ perturbations with $\epsilon=2/255$ , showcasing the approach's scalability to more representationally complex datasets.

Implications and Future Directions

The implications are twofold:

Practical Scalability: By achieving linear scaling in computational complexity and robust training of larger networks, the paper sets a foundation for deploying provably robust models in real-world applications, reducing the vulnerability of machine learning systems to adversarial attacks.
Theoretical Advancement: The modular analysis using dual networks introduces a flexible framework that can be adapted to various architectures, indicating a broader applicability of these methodologies across different model types and attack scenarios.

The research also opens up new avenues for further exploration:

Architectural Innovations: There is room for developing architectures with inherent robustness, potentially leading to better bounds and improved efficiency in training.
Beyond Norm-Bounded Attacks: Future work could explore robustness against more complex adversarial threat models, enhancing the applicability of provably robust defenses in diverse contexts.

Conclusively, this paper substantially advances the ability to train and scale certifiably robust networks, marking a critical step toward robust and dependable AI systems. The provided methodology presents a promising direction for future research and deployment in adversarial vulnerability mitigation.

PDF Markdown

Related Papers

GitHub

GitHub - locuslab/convex_adversarial: A method for training neural networks that are provably robust to adversarial attacks. (381 stars)