Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts (2403.19539v1)

Published 28 Mar 2024 in cs.CV

Abstract: Data-Free Knowledge Distillation (DFKD) is a promising task to train high-performance small models to enhance actual deployment without relying on the original training data. Existing methods commonly avoid relying on private data by utilizing synthetic or sampled data. However, a long-overlooked issue is that the severe distribution shifts between their substitution and original data, which manifests as huge differences in the quality of images and class proportions. The harmful shifts are essentially the confounder that significantly causes performance bottlenecks. To tackle the issue, this paper proposes a novel perspective with causal inference to disentangle the student models from the impact of such shifts. By designing a customized causal graph, we first reveal the causalities among the variables in the DFKD task. Subsequently, we propose a Knowledge Distillation Causal Intervention (KDCI) framework based on the backdoor adjustment to de-confound the confounder. KDCI can be flexibly combined with most existing state-of-the-art baselines. Experiments in combination with six representative DFKD methods demonstrate the effectiveness of our KDCI, which can obviously help existing methods under almost all settings, \textit{e.g.}, improving the baseline by up to 15.54\% accuracy on the CIFAR-100 dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (71)
  1. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
  2. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10012–10022, 2021.
  3. Target and source modality co-reinforcement for emotion understanding from asynchronous multimodal sequences. Knowledge-Based Systems, page 110370, 2023.
  4. Stochastic video normality network for abnormal event detection in surveillance videos. Knowledge-Based Systems, 280:110986, 2023.
  5. Robust emotion recognition in context debiasing. arXiv preprint arXiv:2403.05963, 2024.
  6. Disentangled representation learning for multimodal emotion recognition. In Proceedings of the 30th ACM International Conference on Multimedia (ACM MM), pages 1642–1651, 2022.
  7. Learning modality-specific and -agnostic representations for asynchronous multimodal language sequences. In Proceedings of the 30th ACM International Conference on Multimedia (ACM MM), pages 1708–1717, 2022.
  8. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  9. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4401–4410, 2019.
  10. Language models are few-shot learners. Advances in Neural Information Processing Systems (NeurIPS), 33:1877–1901, 2020.
  11. How2comm: Communication-efficient and collaboration-pragmatic multi-agent perception. Advances in Neural Information Processing Systems (NeurIPS), 36, 2024.
  12. Towards practical certifiable patch defense with vision transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15148–15158, 2022.
  13. Amp-net: Appearance-motion prototype network assisted automatic video anomaly detection system. IEEE Transactions on Industrial Informatics, 2023.
  14. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (ICML), pages 8748–8763. PMLR, 2021.
  15. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  16. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 248–255. Ieee, 2009.
  17. Aide: A vision-driven multi-view, multi-modal, multi-tasking dataset for assistive driving perception. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 20459–20470, 2023.
  18. Adversarial contrastive distillation with adaptive denoising. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023.
  19. Zoom-and-reasoning: Joint foreground zoom and visual-semantic reasoning detection network for aerial images. IEEE Signal Processing Letters, 29:2572–2576, 2022.
  20. Improving generalization in visual reinforcement learning via conflict-aware gradient agreement augmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 23436–23446, 2023.
  21. Data-free knowledge distillation for deep neural networks. arXiv preprint arXiv:1710.07535, 2017.
  22. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2(7), 2015.
  23. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550, 2014.
  24. Out of thin air: Exploring data-free adversarial robustness distillation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 5776–5784, 2024.
  25. Data safe havens in health research and healthcare. Bioinformatics, 31(20):3241–3248, 2015.
  26. Dreaming to distill: Data-free knowledge transfer via deepinversion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8715–8724, 2020.
  27. Contrastive model inversion for data-free knowledge distillation. arXiv preprint arXiv:2105.08584, 2021.
  28. Mosaicking to distill: Knowledge distillation from out-of-domain data. Advances in Neural Information Processing Systems (NeurIPS), 34:11920–11932, 2021.
  29. Up to 100x faster data-free knowledge distillation. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), volume 36, pages 6597–6604, 2022.
  30. Momentum adversarial distillation: Handling large distribution shifts in data-free knowledge distillation. Advances in Neural Information Processing Systems (NeurIPS), 35:10055–10067, 2022.
  31. Data-free learning of student networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3514–3522, 2019.
  32. Learning student networks in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6428–6437, 2021.
  33. Learning multiple layers of features from tiny images. 2009.
  34. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in Neural Information Processing Systems (NeurIPS), 30, 2017.
  35. Judea Pearl. Causal inference in statistics: An overview. Statistics Surveys, 3:96–146, 2009.
  36. Cue competition in causality judgments: The role of nonpresentation of compound stimulus elements. Learning and Motivation, 25(2):127–151, 1994.
  37. Causal inference in statistics: A primer. John Wiley & Sons, 2016.
  38. Zero-shot knowledge transfer via adversarial belief matching. Advances in Neural Information Processing Systems (NeurIPS), 32, 2019.
  39. Dream distillation: A data-independent model compression framework. arXiv preprint arXiv:1905.07072, 2019.
  40. Large-scale generative data-free distillation. arXiv preprint arXiv:2012.05578, 2020.
  41. Data-free network quantization with adversarial knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 710–711, 2020.
  42. Explicit and implicit knowledge distillation via unlabeled data. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023.
  43. Sampling to distill: Knowledge transfer from open-world data. arXiv preprint arXiv:2307.16601, 2023.
  44. Hal R Varian. Causal inference in economics and marketing. Proceedings of the National Academy of Sciences, 113(27):7310–7315, 2016.
  45. E Michael Foster. Causal inference and developmental psychology. Developmental Psychology, 46(6):1454, 2010.
  46. Visual commonsense r-cnn. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10760–10770, 2020.
  47. Causal intervention for subject-deconfounded facial action unit recognition. arXiv preprint arXiv:2204.07935, 2022.
  48. Causal attention for vision-language tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9847–9857, 2021.
  49. Learning causality-inspired representation consistency for video anomaly detection. In Proceedings of the 31st ACM International Conference on Multimedia (ACM MM), pages 203–212, 2023.
  50. Towards multimodal human intention understanding debiasing via subject-deconfounding. arXiv preprint arXiv:2403.05025, 2024.
  51. Comprehensive knowledge distillation with causal intervention. Advances in Neural Information Processing Systems (NeurIPS), 34:22158–22170, 2021.
  52. Unbiased scene graph generation from biased training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3716–3725, 2020.
  53. Counterfactual reasoning for out-of-distribution multimodal sentiment analysis. arXiv preprint arXiv:2207.11652, 2022.
  54. Counterfactual inference for text classification debiasing. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5434–5445, 2021.
  55. Towards multimodal sentiment analysis debiasing via bias purification. arXiv preprint arXiv:2403.05023, 2024.
  56. Judea Pearl. Causality. Cambridge University Press, 2009.
  57. Counterfactual vqa: A cause-effect look at language bias. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12700–12710, 2021.
  58. Context de-confounded emotion recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 19005–19015, June 2023.
  59. Judea Pearl et al. Models, reasoning and inference. Cambridge, UK: CambridgeUniversityPress, 19:2, 2000.
  60. Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning (ICML), pages 2048–2057. PMLR, 2015.
  61. Ya Le and Xuan Yang. Tiny imagenet visual recognition challenge. CS 231N, 7(7):3, 2015.
  62. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  63. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.
  64. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.
  65. Decoupled knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11953–11962, 2022.
  66. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (ICML), pages 448–456. pmlr, 2015.
  67. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(11), 2008.
  68. Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems (NeurIPS), 32, 2019.
  69. Preventing catastrophic forgetting and distribution mismatch in knowledge distillation via synthetic data. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 663–671, 2022.
  70. Robust and resource-efficient data-free knowledge distillation by generative pseudo replay. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), volume 36, pages 6089–6096, 2022.
  71. Learning to retain while acquiring: Combating distribution-shift in adversarial data-free knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7786–7794, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Yuzheng Wang (15 papers)
  2. Dingkang Yang (57 papers)
  3. Zhaoyu Chen (52 papers)
  4. Yang Liu (2253 papers)
  5. Siao Liu (8 papers)
  6. Wenqiang Zhang (87 papers)
  7. Lihua Zhang (68 papers)
  8. Lizhe Qi (10 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.