Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers (1906.04584v5)

Published 9 Jun 2019 in cs.LG, cs.CR, and stat.ML

Abstract: Recent works have shown the effectiveness of randomized smoothing as a scalable technique for building neural network-based classifiers that are provably robust to $\ell_2$-norm adversarial perturbations. In this paper, we employ adversarial training to improve the performance of randomized smoothing. We design an adapted attack for smoothed classifiers, and we show how this attack can be used in an adversarial training setting to boost the provable robustness of smoothed classifiers. We demonstrate through extensive experimentation that our method consistently outperforms all existing provably $\ell_2$-robust classifiers by a significant margin on ImageNet and CIFAR-10, establishing the state-of-the-art for provable $\ell_2$-defenses. Moreover, we find that pre-training and semi-supervised learning boost adversarially trained smoothed classifiers even further. Our code and trained models are available at http://github.com/Hadisalman/smoothing-adversarial .

Authors (7)

Hadi Salman (27 papers)
Greg Yang (35 papers)
Jerry Li (81 papers)
Pengchuan Zhang (58 papers)
Huan Zhang (171 papers)
Ilya Razenshteyn (33 papers)
Sebastien Bubeck (13 papers)

Citations (517)

View on Semantic Scholar

Summary

The paper demonstrates that combining adversarial training with randomized smoothing significantly boosts certifiable ℓ2 robustness in neural networks.
It introduces a tailored adversarial attack strategy that effectively strengthens smoothed classifiers during the training process.
Empirical results on ImageNet and CIFAR-10 show improved top-1 accuracy and robustness via pre-training and semi-supervised learning.

Overview of "Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers"

The paper "Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers" addresses the challenge of adversarial robustness in neural networks, which are notorious for being vulnerable to carefully crafted perturbations. The authors focus on enhancing randomized smoothing, a technique that transforms any classifier into a smoothed version with certifiable robustness against $\ell_2$ adversarial perturbations.

Key Contributions

This work stands out due to the integration of adversarial training with the randomized smoothing framework. The primary contributions are:

Adapted Attack for Smoothed Classifiers: The authors design a novel attack strategy specifically aimed at smoothed classifiers. This tailored attack enhances the adversarial training process, enabling the creation of more robust models.
Adversarial Training Technique: Through extensive experimentation, the paper demonstrates that models trained using their approach outperform the state-of-the-art in $\ell_2$ -robust classification, particularly on challenging datasets like ImageNet and CIFAR-10.
Enhanced Robustness with Pre-training and Semi-supervision: They further improve classifier robustness by incorporating pre-training and semi-supervised learning, achieving significant performance gains.

Experimental Results

The experimental results are compelling:

On ImageNet, the proposed method achieves a provable top-1 accuracy of 56% with perturbations of $\ell_2$ norm less than $127/255$, surpassing previous benchmarks.
On CIFAR-10, the smoothed classifier sees up to a 22% improvement over prior models when combined with pre-training and semi-supervised techniques.

The robustness improvements are quantified in detailed tables showing certified top-1 accuracy across different radii, reflecting the practical efficacy of the proposed method in real-world scenarios.

Implications and Future Directions

The integration of adversarial training with randomized smoothing offers a scalable and effective way of enhancing neural network robustness for large architectures. This approach not only improves empirical robustness but also strengthens certifiable guarantees, crucial for applications demanding high trust levels.

Future research could explore optimizing the balance between empirical and certifiable robustness for different perturbation types, improving computation efficiency, and extending the framework to other forms of adversarial attacks beyond $\ell_2$ -norm perturbations.

Conclusion

This paper contributes significantly to the field of robust deep learning by effectively combining adversarial training with randomized smoothing, demonstrating substantial improvements in certified robustness. The results highlight the potential for further enhancing robustness through pre-training and leveraging additional data, paving the way for safer deployment of neural networks in adversarial environments.

Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers (1906.04584v5)

Summary

Overview of "Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers"

Key Contributions

Experimental Results

Implications and Future Directions

Conclusion

GitHub

YouTube

Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers (1906.04584v5)

Summary

Overview of "Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers"

Key Contributions

Experimental Results

Implications and Future Directions

Conclusion

Related Papers

GitHub

YouTube