On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models (1810.12715v4)

Published 30 Oct 2018 in cs.LG, cs.CR, and stat.ML

Abstract: Recent work has shown that it is possible to train deep neural networks that are provably robust to norm-bounded adversarial perturbations. Most of these methods are based on minimizing an upper bound on the worst-case loss over all possible adversarial perturbations. While these techniques show promise, they often result in difficult optimization procedures that remain hard to scale to larger networks. Through a comprehensive analysis, we show how a simple bounding technique, interval bound propagation (IBP), can be exploited to train large provably robust neural networks that beat the state-of-the-art in verified accuracy. While the upper bound computed by IBP can be quite weak for general networks, we demonstrate that an appropriate loss and clever hyper-parameter schedule allow the network to adapt such that the IBP bound is tight. This results in a fast and stable learning algorithm that outperforms more sophisticated methods and achieves state-of-the-art results on MNIST, CIFAR-10 and SVHN. It also allows us to train the largest model to be verified beyond vacuous bounds on a downscaled version of ImageNet.

Citations (531)

View on Semantic Scholar

Summary

The paper introduces a scalable training method using Interval Bound Propagation that requires only two additional passes to obtain tight bounds on adversarial robustness.
It employs a novel training curriculum that dynamically adjusts perturbation size and robustness trade-offs, yielding competitive empirical and verified accuracy.
Empirical evaluations on MNIST, CIFAR-10, SVHN, and downscaled ImageNet demonstrate improved verified error rates and practical scalability for robust image classification.

Scalable Verified Training for Provably Robust Image Classification

The paper by Sven Gowal et al. presents a methodology to train deep neural networks that are provably robust to norm-bounded adversarial perturbations. The research addresses the challenge of scaling verified training methods to large neural networks while maintaining state-of-the-art verified accuracy, particularly using a technique known as Interval Bound Propagation (IBP).

Summary of Contributions

Interval Bound Propagation (IBP): The paper emphasizes using IBP, a fast method derived from interval arithmetic, to train verifiably robust classifiers. Unlike other complex techniques, IBP requires only two additional forward passes through the network, allowing it to scale effectively.
Training Strategy: The authors employ a unique training curriculum that adjusts $\epsilon$ , the perturbation size, and $\kappa$ , a parameter balancing the fit to data versus robustness during training. This aids in achieving tight bounds, ensuring both nominal and adversarial robustness.
Empirical and Verified Evaluation: The research demonstrates competitive empirical adversarial accuracy against prevalent methods like those by Madry and Wong et al., while achieving superior verified accuracy. This is shown through extensive experiments on datasets like MNIST, CIFAR-10, SVHN, and even a downscaled version of ImageNet, the latter being verified beyond vacuous bounds.
Theoretical and Practical Implications: The results challenge the assumption that tighter relaxations are always necessary for stronger verifiable guarantees. IBP's simplicity and computational efficiency make it a viable choice for broader applications.
Code Availability: The paper ensures reproducibility by providing the code for training robust models with IBP.

Results and Comparisons

The research establishes benchmarks in verifiable robustness across several datasets:

MNIST: Verified error rates significantly improve to 2.23% at $\epsilon = 0.1$ , outperforming other methods in both empirical adversarial robustness and verified bounds.
CIFAR-10 and SVHN: The paper reports competitive adversarial robustness and sets new state-of-the-art verified accuracies.
ImageNet: The paper reports training of a provably robust model on a downscaled version of ImageNet, achieving a non-vacuous verified error rate of 93.87% at $\epsilon = 1/255$ , marking an important milestone in the scalability of verifiable models.

Implications for Future Work

The paper paves the way for developing scalable verification tools in practical applications where robustness is critical. Moreover, the approach fosters exploration into hybrid methods that combine efficient yet simple bounding techniques like IBP with more sophisticated relaxations to tackle deeper and wider models effectively.

Conclusion

By leveraging IBP within a well-crafted training paradigm, the researchers demonstrate that it is possible to train large-scale neural networks with rigorous robustness guarantees efficiently. This advancement offers a practical approach towards integrating verifiable robustness into machine learning systems, potentially influencing future work in adversarial machine learning and verification research.

PDF Markdown