Learnable Boundary Guided Adversarial Training (2011.11164v2)

Published 23 Nov 2020 in cs.CV

Abstract: Previous adversarial training raises model robustness under the compromise of accuracy on natural data. In this paper, we reduce natural accuracy degradation. We use the model logits from one clean model to guide learning of another one robust model, taking into consideration that logits from the well trained clean model embed the most discriminative features of natural data, {\it e.g.}, generalizable classifier boundary. Our solution is to constrain logits from the robust model that takes adversarial examples as input and makes it similar to those from the clean model fed with corresponding natural data. It lets the robust model inherit the classifier boundary of the clean model. Moreover, we observe such boundary guidance can not only preserve high natural accuracy but also benefit model robustness, which gives new insights and facilitates progress for the adversarial community. Finally, extensive experiments on CIFAR-10, CIFAR-100, and Tiny ImageNet testify to the effectiveness of our method. We achieve new state-of-the-art robustness on CIFAR-100 without additional real or synthetic data with auto-attack benchmark \footnote{\url{https://github.com/fra31/auto-attack}}. Our code is available at \url{https://github.com/dvlab-research/LBGAT}.

View on arXiv

Authors (4)

Jiequan Cui (22 papers)
Shu Liu (146 papers)
Liwei Wang (239 papers)
Jiaya Jia (162 papers)

Citations (120)

View on Semantic Scholar

Summary

Learnable Boundary Guided Adversarial Training

Introduction

The paper "Learnable Boundary Guided Adversarial Training" by Jiequan Cui et al. proposes an innovative approach to adversarial training, aiming to enhance model robustness while minimizing accuracy degradation on natural data. The researchers address a significant challenge in adversarial training: the trade-off between robustness to adversarial attacks and maintaining high accuracy on clean data. They utilize a concept of boundary guidance, informed by logits of a well-trained clean model, to guide the learning of a robust model.

Methodology

The proposed method involves using logits from a naturally trained model to guide a robust model under adversarial training. The researchers introduce two training strategies:

Boundary Guided Adversarial Training (BGAT): A static approach where the robust model aligns its logits of adversarial examples with those of natural data from a pre-trained clean model.
Learnable Boundary Guided Adversarial Training (LBGAT): A dynamic, co-training approach where the robust model and the clean model are trained simultaneously. This allows the natural model to adapt and discover a robustness-friendly boundary while maintaining high performance on clean data.

Both methods aim to transfer the classifier boundary effectively from the natural model to the robust model. The focus is on aligning logits to achieve similarity in output distributions between adversarial and clean conditions.

Experimental Results

Extensive experiments were conducted on CIFAR-10, CIFAR-100, and Tiny ImageNet datasets. The results indicate that:

Natural Accuracy Preservation: The LBGAT method significantly improves natural accuracy while maintaining comparable robustness levels. For instance, on the CIFAR-100 dataset, LBGAT achieved an accuracy improvement of 13.53% over the TRADES method for natural data.
Robustness Against Auto-Attack: The LBGAT method set state-of-the-art robustness measures, outperforming previous benchmarks under the Auto-Attack benchmark.
Flexibility: The proposed method is flexible and can be combined with existing adversarial training techniques like ALP and TRADES, further enhancing their performance.

Implications and Future Directions

The insights from this paper suggest a few critical implications and future research directions:

Theoretical Understanding: The approach provides a new perspective on leveraging natural boundaries to improve adversarial robustness, potentially paving the way for theoretical advancements in understanding model generalization in adversarial settings.
Practical Applications: Improved robustness without compromising natural accuracy is crucial in deploying ML models in security-sensitive applications, further emphasizing the practicality of this approach.
Further Investigations: Future work may explore different model architectures or datasets and investigate the generalizability of the boundary guidance method across more diverse adversarial settings.

Conclusion

The paper contributes a novel methodology to the adversarial training landscape, enhancing the robustness of neural networks while preserving natural accuracy. LBGAT offers an effective strategy by integrating learnable boundary guidance, providing new insights and practical benefits to the adversarial machine learning community.

PDF Markdown

Related Papers

Find Related Papers