Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Out of Thin Air: Exploring Data-Free Adversarial Robustness Distillation (2303.11611v2)

Published 21 Mar 2023 in cs.CV

Abstract: Adversarial Robustness Distillation (ARD) is a promising task to solve the issue of limited adversarial robustness of small capacity models while optimizing the expensive computational costs of Adversarial Training (AT). Despite the good robust performance, the existing ARD methods are still impractical to deploy in natural high-security scenes due to these methods rely entirely on original or publicly available data with a similar distribution. In fact, these data are almost always private, specific, and distinctive for scenes that require high robustness. To tackle these issues, we propose a challenging but significant task called Data-Free Adversarial Robustness Distillation (DFARD), which aims to train small, easily deployable, robust models without relying on data. We demonstrate that the challenge lies in the lower upper bound of knowledge transfer information, making it crucial to mining and transferring knowledge more efficiently. Inspired by human education, we design a plug-and-play Interactive Temperature Adjustment (ITA) strategy to improve the efficiency of knowledge transfer and propose an Adaptive Generator Balance (AGB) module to retain more data information. Our method uses adaptive hyperparameters to avoid a large number of parameter tuning, which significantly outperforms the combination of existing techniques. Meanwhile, our method achieves stable and reliable performance on multiple benchmarks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In International conference on machine learning, 274–283. PMLR.
  2. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, 41–48.
  3. Dream distillation: A data-independent model compression framework. arXiv preprint arXiv:1905.07072.
  4. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), 39–57. Ieee.
  5. Data-free learning of student networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 3514–3522.
  6. Content-based Unrestricted Adversarial Attack. arXiv preprint arXiv:2305.10665.
  7. Shape matters: deformable patch attack. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, 529–548. Springer.
  8. Towards practical certifiable patch defense with vision transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 15148–15158.
  9. Data-free network quantization with adversarial knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 710–711.
  10. RobustBench: a standardized adversarial robustness benchmark. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
  11. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International conference on machine learning, 2206–2216. PMLR.
  12. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  13. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
  14. Mosaicking to distill: Knowledge distillation from out-of-domain data. Advances in Neural Information Processing Systems, 34: 11920–11932.
  15. Up to 100x Faster Data-Free Knowledge Distillation. In Proceedings of the AAAI Conference on Artificial Intelligence, 6597–6604.
  16. Data-free adversarial distillation. arXiv preprint arXiv:1912.11006.
  17. Contrastive model inversion for data-free knowledge distillation. arXiv preprint arXiv:2105.08584.
  18. Adversarially robust distillation. In Proceedings of the AAAI Conference on Artificial Intelligence, 3996–4003.
  19. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572.
  20. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778.
  21. Knowledge distillation with adversarial samples supporting decision boundary. In Proceedings of the AAAI conference on artificial intelligence, 3771–3778.
  22. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2(7).
  23. Comdefend: An efficient image compression model to defend adversarial examples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 6084–6092.
  24. Learning multiple layers of features from tiny images.
  25. Asymmetric Temperature Scaling Makes Larger Networks Teach Well Again. Advances in neural information processing systems.
  26. Curriculum Temperature for Knowledge Distillation. arXiv preprint arXiv:2211.16231.
  27. Improving generalization in visual reinforcement learning via conflict-aware gradient agreement augmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 23436–23446.
  28. Amp-net: Appearance-motion prototype network assisted automatic video anomaly detection system. IEEE Transactions on Industrial Informatics.
  29. Learning causality-inspired representation consistency for video anomaly detection. In Proceedings of the 31st ACM International Conference on Multimedia, 203–212.
  30. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083.
  31. Zero-shot knowledge transfer via adversarial belief matching. Advances in Neural Information Processing Systems, 32.
  32. When does label smoothing help? Advances in neural information processing systems, 32.
  33. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
  34. Curriculum learning of multiple tasks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5492–5500.
  35. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125.
  36. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550.
  37. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 4510–4520.
  38. Shannon, C. E. 1948. A mathematical theory of communication. The Bell system technical journal, 27(3): 379–423.
  39. Is label smoothing truly incompatible with knowledge distillation: An empirical study. arXiv preprint arXiv:2104.00676.
  40. Adversarial Contrastive Distillation with Adaptive Denoising. arXiv preprint arXiv:2302.08764.
  41. Sampling to distill: Knowledge transfer from open-world data. arXiv preprint arXiv:2307.16601.
  42. Explicit and implicit knowledge distillation via unlabeled data. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5. IEEE.
  43. Context De-confounded Emotion Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 19005–19015.
  44. Aide: A vision-driven multi-view, multi-modal, multi-tasking dataset for assistive driving perception. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 20459–20470.
  45. Target and source modality co-reinforcement for emotion understanding from asynchronous multimodal sequences. Knowledge-Based Systems, 110370.
  46. How2comm: Communication-Efficient and Collaboration-Pragmatic Multi-Agent Perception. In Thirty-seventh Conference on Neural Information Processing Systems.
  47. Generalized out-of-distribution detection: A survey. arXiv preprint arXiv:2110.11334.
  48. Data-free knowledge amalgamation via group-stack dual-gan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12516–12525.
  49. Dreaming to distill: Data-free knowledge transfer via deepinversion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8715–8724.
  50. Wide residual networks. arXiv preprint arXiv:1605.07146.
  51. Rethinking Lipschitz Neural Networks for Certified L-infinity Robustness. Advances in Neural Information Processing Systems.
  52. Theoretically principled trade-off between robustness and accuracy. In International conference on machine learning, 7472–7482. PMLR.
  53. Quantifying the knowledge in a DNN to explain knowledge distillation for classification. IEEE Transactions on Pattern Analysis and Machine Intelligence.
  54. Enhanced Accuracy and Robustness via Multi-teacher Adversarial Distillation. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, 585–602.
  55. Reliable Adversarial Distillation with Unreliable Teachers. In International Conference on Learning Representations.
  56. Revisiting adversarial robustness distillation: Robust soft labels make student better. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 16443–16452.
Citations (6)

Summary

We haven't generated a summary for this paper yet.