Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adversarially Robust Distillation (1905.09747v2)

Published 23 May 2019 in cs.LG, cs.CV, and stat.ML

Abstract: Knowledge distillation is effective for producing small, high-performance neural networks for classification, but these small networks are vulnerable to adversarial attacks. This paper studies how adversarial robustness transfers from teacher to student during knowledge distillation. We find that a large amount of robustness may be inherited by the student even when distilled on only clean images. Second, we introduce Adversarially Robust Distillation (ARD) for distilling robustness onto student networks. In addition to producing small models with high test accuracy like conventional distillation, ARD also passes the superior robustness of large networks onto the student. In our experiments, we find that ARD student models decisively outperform adversarially trained networks of identical architecture in terms of robust accuracy, surpassing state-of-the-art methods on standard robustness benchmarks. Finally, we adapt recent fast adversarial training methods to ARD for accelerated robust distillation.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Micah Goldblum (96 papers)
  2. Liam Fowl (25 papers)
  3. Soheil Feizi (127 papers)
  4. Tom Goldstein (226 papers)
Citations (182)

Summary

An Evaluation of Adversarially Robust Distillation (ARD) Techniques

This paper provides a comprehensive evaluation of Adversarially Robust Distillation (ARD) techniques, primarily focusing on the efficiency of student-teacher models and the implications of various hyperparameters on the performance trade-offs involving natural and robust accuracies. The authors scrutinize the ARD process utilizing different neural network architectures as teacher-student pairs, such as the WideResNet and ResNet18 teacher models and the MobileNetV2 student model.

In detail, the research critically assesses the space and time efficiency of these deep learning models. The ResNet18 and WideResNet include approximately $11.2$ million and $46.2$ million parameters, respectively, while the MobileNetV2 student model is significantly more compact with $2.3$ million parameters. Performance efficiency was measured using multiply-add (MAdd) operations, where a notable finding was the substantial decrease to 1.4%1.4\% of operations during a forward pass in comparison to the WideResNet.

The paper deeply investigates the effects of temperature and α\alpha parameters on knowledge distillation outcomes with respect to robustness trade-offs. The paper reveals that while the temperature parameter has minimal effect on robustness, the reduction in α\alpha leads to a rapid decline in robustness, indicating an accuracy-robustness tradeoff particularly impactful for low α\alpha values. Different approaches were considered for data augmentation, with findings suggesting that basic augmentation strategies such as horizontal flips and random cropping were effective for robustness without incurring the drawbacks associated with adversarial point teacher behavior training.

Multiple experimental tables included demonstrate the nuanced impacts of hyperparameters adjustment, such as temperature and α\alpha, on the performance of knowledge distillation. Additional analysis evaluates other ARD technical configurations. For instance, including a knowledge distillation $\KL$ divergence term degraded robust accuracy, and substituting Tt(xi)T^t(\mathbf{x}_i') for Tt(xi)T^t(\mathbf{x}_i) in the loss function was found detrimental to overall accuracy.

Moreover, naturally trained teacher models under ARD conditions still produced students with some robustness, albeit lower than those distilled from adversarially trained counterparts. Further, the acceleration of ARD training was explored by lessening the number of attack steps, which enhanced natural accuracy but slightly dropped robustness, supporting the consideration of strategic trade-offs in ARD deployments.

Practically and theoretically, these findings contribute to the ongoing discourse in adversarial training by revealing the intricacies of ARD's hyperparameter sensitivity and operational efficiency. The implications of this work suggest potential refinements in robust model training protocols and highlight the balance between computational cost and accuracy in practice. Future directions might involve adaptive data augmentation tailored for robustness, enhancing the applicability of ARD strategies in real-world scenarios.