Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Direct Distillation between Different Domains (2401.06826v1)

Published 12 Jan 2024 in cs.LG, cs.AI, and cs.CV

Abstract: Knowledge Distillation (KD) aims to learn a compact student network using knowledge from a large pre-trained teacher network, where both networks are trained on data from the same distribution. However, in practical applications, the student network may be required to perform in a new scenario (i.e., the target domain), which usually exhibits significant differences from the known scenario of the teacher network (i.e., the source domain). The traditional domain adaptation techniques can be integrated with KD in a two-stage process to bridge the domain gap, but the ultimate reliability of two-stage approaches tends to be limited due to the high computational consumption and the additional errors accumulated from both stages. To solve this problem, we propose a new one-stage method dubbed ``Direct Distillation between Different Domains" (4Ds). We first design a learnable adapter based on the Fourier transform to separate the domain-invariant knowledge from the domain-specific knowledge. Then, we build a fusion-activation mechanism to transfer the valuable domain-invariant knowledge to the student network, while simultaneously encouraging the adapter within the teacher network to learn the domain-specific knowledge of the target data. As a result, the teacher network can effectively transfer categorical knowledge that aligns with the target domain of the student network. Intensive experiments on various benchmark datasets demonstrate that our proposed 4Ds method successfully produces reliable student networks and outperforms state-of-the-art approaches.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Variational information distillation for knowledge transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9163–9171, 2019.
  2. Do deep nets really need to be deep? In Advances in Neural Information Processing Systems (NeurIPS), 2014.
  3. Cross-layer distillation with semantic calibration. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 7028–7036, 2021a.
  4. Knowledge distillation with the reused teacher classifier. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11933–11942, 2022.
  5. Amplitude-phase recombination: Rethinking robustness of convolutional neural networks in frequency domain. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 458–467, 2021b.
  6. Data-free learning of student networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3514–3522, 2019.
  7. Learning student networks in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6428–6437, 2021c.
  8. Data-free network quantization with adversarial knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR), pages 710–711, 2020.
  9. The fast fourier transform and its applications. IEEE Transactions on Education (ToE), 12(1):27–34, 1969.
  10. Momentum adversarial distillation: Handling large distribution shifts in data-free knowledge distillation. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  11. Hrkd: hierarchical relational knowledge distillation for cross-domain language model compression. arXiv preprint arXiv:2110.08551, 2021.
  12. Mosaicking to distill: Knowledge distillation from out-of-domain data. In Advances in Neural Information Processing Systems (NeurIPS), pages 11920–11932, 2021.
  13. Unsupervised domain adaptation for medical image segmentation by selective entropy constraints and adaptive semantic alignment. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 623–631, 2023.
  14. Cross-domain correlation distillation for unsupervised domain adaptation in nighttime semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9913–9923, 2022.
  15. Ross Girshick. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 1440–1448, 2015.
  16. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 580–587, 2014.
  17. Geodesic flow kernel for unsupervised domain adaptation. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2066–2073, 2012.
  18. Preserving privacy in federated learning with ensemble cross-domain knowledge distillation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11891–11899, 2022.
  19. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
  20. Distilling the knowledge in a neural network. arXiv:1503.02531, 2015.
  21. Model adaptation: Historical contrastive learning for unsupervised domain adaptation without source data. In Advances in Neural Information Processing Systems (NeurIPS), pages 3635–3649, 2021.
  22. Dynamic distillation network for cross-domain few-shot recognition with unlabeled data. In Advances in Neural Information Processing Systems (NeurIPS), pages 3584–3595, 2021.
  23. Decompose, adjust, compose: Effective normalization by playing with frequency for domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11776–11785, 2023.
  24. Cross-domain adaptive clustering for semi-supervised domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2505–2514, 2021.
  25. Teacher-student mutual learning for efficient source-free unsupervised domain adaptation. Knowledge-Based Systems, 261:110204, 2023.
  26. Source data-absent unsupervised domain adaptation through hypothesis transfer and labeling transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 44(11):8602–8617, 2021.
  27. Transfer joint matching for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1410–1417, 2014.
  28. The fast Fourier transform. Springer, 1982.
  29. Learning student-friendly teacher networks for knowledge distillation. In Advances in Neural Information Processing Systems (NeurIPS), pages 13292–13303, 2021.
  30. Correlation congruence for knowledge distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 5007–5016, 2019a.
  31. Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 1406–1415, 2019b.
  32. Fitnets: Hints for thin deep nets. arXiv:1412.6550, 2014.
  33. Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations (ICLR), 2014.
  34. Distribution shift matters for knowledge distillation with webly collected images. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
  35. Similarity-preserving knowledge distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 1365–1374, 2019.
  36. Adversarial discriminative domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 7167–7176, 2017.
  37. Deep hashing network for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5018–5027, 2017.
  38. Convolutional neural network pruning with structural redundancy reduction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14913–14922, 2021.
  39. A fourier-based framework for domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14383–14392, 2021.
  40. Knowledge distillation via adaptive instance normalization. arXiv:2003.04289, 2020.
  41. Exploiting the intrinsic neighborhood structure for source-free domain adaptation. In Advances in Neural Information Processing Systems (NeurIPS), pages 29393–29405, 2021.
  42. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv:1612.03928, 2016.
  43. Matching distributions between model and data: Cross-domain knowledge distillation for unsupervised domain adaptation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL), pages 5423–5433, 2021.
  44. Divide and contrast: Source-free domain adaptation via adaptive contrastive learning. In Advances in Neural Information Processing Systems (NeurIPS), pages 5137–5149, 2022.
  45. Multi-source distilling domain adaptation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 12975–12983, 2020.
  46. Deep domain-adversarial image generation for domain generalisation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 13025–13032, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Jialiang Tang (10 papers)
  2. Shuo Chen (127 papers)
  3. Gang Niu (125 papers)
  4. Hongyuan Zhu (36 papers)
  5. Joey Tianyi Zhou (116 papers)
  6. Chen Gong (152 papers)
  7. Masashi Sugiyama (286 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.