AuG-KD: Anchor-Based Mixup Generation for Out-of-Domain Knowledge Distillation
Abstract: Due to privacy or patent concerns, a growing number of large models are released without granting access to their training data, making transferring their knowledge inefficient and problematic. In response, Data-Free Knowledge Distillation (DFKD) methods have emerged as direct solutions. However, simply adopting models derived from DFKD for real-world applications suffers significant performance degradation, due to the discrepancy between teachers' training data and real-world scenarios (student domain). The degradation stems from the portions of teachers' knowledge that are not applicable to the student domain. They are specific to the teacher domain and would undermine students' performance. Hence, selectively transferring teachers' appropriate knowledge becomes the primary challenge in DFKD. In this work, we propose a simple but effective method AuG-KD. It utilizes an uncertainty-guided and sample-specific anchor to align student-domain data with the teacher domain and leverages a generative method to progressively trade off the learning process between OOD knowledge distillation and domain-specific information learning via mixup learning. Extensive experiments in 3 datasets and 8 settings demonstrate the stability and superiority of our approach. Code available at https://github.com/IshiKura-a/AuG-KD .
- Invariant risk minimization. CoRR, abs/1907.02893, 2019. URL http://arxiv.org/abs/1907.02893.
- Prompt-based distribution alignment for unsupervised domain adaptation. In Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI 2024). AAAI Press, 2024.
- Robust and resource-efficient data-free knowledge distillation by generative pseudo replay. In Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022, pp. 6089–6096. AAAI Press, 2022. URL https://ojs.aaai.org/index.php/AAAI/article/view/20556.
- Model compression. In Tina Eliassi-Rad, Lyle H. Ungar, Mark Craven, and Dimitrios Gunopulos (eds.), Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, August 20-23, 2006, pp. 535–541. ACM, 2006. doi: 10.1145/1150402.1150464. URL https://doi.org/10.1145/1150402.1150464.
- A systematic study of knowledge distillation for natural language generation with pseudo-target training. In Anna Rogers, Jordan L. Boyd-Graber, and Naoaki Okazaki (eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pp. 14632–14659. Association for Computational Linguistics, 2023. URL https://aclanthology.org/2023.acl-long.818.
- Learning student networks in the wild. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pp. 6428–6437. Computer Vision Foundation / IEEE, 2021. doi: 10.1109/CVPR46437.2021.00636. URL https://openaccess.thecvf.com/content/CVPR2021/html/Chen_Learning_Student_Networks_in_the_Wild_CVPR_2021_paper.html.
- Data-free network quantization with adversarial knowledge distillation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR Workshops 2020, Seattle, WA, USA, June 14-19, 2020, pp. 3047–3057. Computer Vision Foundation / IEEE, 2020. doi: 10.1109/CVPRW50498.2020.00363. URL https://openaccess.thecvf.com/content_CVPRW_2020/html/w40/Choi_Data-Free_Network_Quantization_With_Adversarial_Knowledge_Distillation_CVPRW_2020_paper.html.
- Environment inference for invariant learning. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pp. 2189–2200. PMLR, 2021. URL http://proceedings.mlr.press/v139/creager21a.html.
- ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, 2009.
- Source-free domain adaptation via distribution estimation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pp. 7202–7212. IEEE, 2022. doi: 10.1109/CVPR52688.2022.00707. URL https://doi.org/10.1109/CVPR52688.2022.00707.
- Momentum adversarial distillation: Handling large distribution shifts in data-free knowledge distillation. In NeurIPS, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/41128e5b3a7622da5b17588757599077-Abstract-Conference.html.
- Mosaicking to distill: Knowledge distillation from out-of-domain data. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (eds.), Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pp. 11920–11932, 2021a. URL https://proceedings.neurips.cc/paper/2021/hash/63dc7ed1010d3c3b8269faf0ba7491d4-Abstract.html.
- Contrastive model invertion for data-free knolwedge distillation. In Zhi-Hua Zhou (ed.), Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Virtual Event / Montreal, Canada, 19-27 August 2021, pp. 2374–2380. ijcai.org, 2021b. doi: 10.24963/ijcai.2021/327. URL https://doi.org/10.24963/ijcai.2021/327.
- Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, 2016. doi: 10.1109/CVPR.2016.90. URL https://doi.org/10.1109/CVPR.2016.90.
- Distilling the knowledge in a neural network. CoRR, abs/1503.02531, 2015. URL http://arxiv.org/abs/1503.02531.
- Searching for mobilenetv3. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pp. 1314–1324. IEEE, 2019. doi: 10.1109/ICCV.2019.00140. URL https://doi.org/10.1109/ICCV.2019.00140.
- Model adaptation: Historical contrastive learning for unsupervised domain adaptation without source data. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (eds.), Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pp. 3635–3649, 2021. URL https://proceedings.neurips.cc/paper/2021/hash/1dba5eed8838571e1c80af145184e515-Abstract.html.
- C-sfda: A curriculum learning aided self-training framework for efficient source free domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 24120–24131, June 2023.
- Stereo confidence estimation via locally adaptive fusion and knowledge distillation. IEEE Trans. Pattern Anal. Mach. Intell., 45(5):6372–6385, 2023. doi: 10.1109/TPAMI.2022.3207286. URL https://doi.org/10.1109/TPAMI.2022.3207286.
- Alex Krizhevsky. Learning multiple layers of features from tiny images. pp. 32–33, 2009. URL https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf.
- Stable prediction across unknown environments. In Yike Guo and Faisal Farooq (eds.), Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, August 19-23, 2018, pp. 1617–1626. ACM, 2018. doi: 10.1145/3219819.3220082. URL https://doi.org/10.1145/3219819.3220082.
- Concurrent subsidiary supervision for unsupervised source-free domain adaptation. In Shai Avidan, Gabriel J. Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner (eds.), Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXX, volume 13690 of Lecture Notes in Computer Science, pp. 177–194. Springer, 2022. doi: 10.1007/978-3-031-20056-4_11. URL https://doi.org/10.1007/978-3-031-20056-4_11.
- Dynamic data-free knowledge distillation by easy-to-hard learning strategy. Inf. Sci., 642:119202, 2023. doi: 10.1016/j.ins.2023.119202. URL https://doi.org/10.1016/j.ins.2023.119202.
- Cross-domain and cross-modal knowledge distillation in domain adaptation for 3d semantic segmentation. In João Magalhães, Alberto Del Bimbo, Shin’ichi Satoh, Nicu Sebe, Xavier Alameda-Pineda, Qin Jin, Vincent Oria, and Laura Toni (eds.), MM ’22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10 - 14, 2022, pp. 3829–3837. ACM, 2022. doi: 10.1145/3503161.3547990. URL https://doi.org/10.1145/3503161.3547990.
- Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pp. 6028–6039. PMLR, 2020. URL http://proceedings.mlr.press/v119/liang20a.html.
- Source data-absent unsupervised domain adaptation through hypothesis transfer and labeling transfer. IEEE Trans. Pattern Anal. Mach. Intell., 44(11):8602–8617, 2022. doi: 10.1109/TPAMI.2021.3103390. URL https://doi.org/10.1109/TPAMI.2021.3103390.
- Microsoft COCO: common objects in context. In David J. Fleet, Tomás Pajdla, Bernt Schiele, and Tinne Tuytelaars (eds.), Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V, volume 8693 of Lecture Notes in Computer Science, pp. 740–755. Springer, 2014. doi: 10.1007/978-3-319-10602-1_48. URL https://doi.org/10.1007/978-3-319-10602-1_48.
- Energy-based out-of-distribution detection. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/f5496252609c43eb8a3d147ab9b9c006-Abstract.html.
- ML-LJP: multi-law aware legal judgment prediction. In Hsin-Hsi Chen, Wei-Jou (Edward) Duh, Hen-Hsen Huang, Makoto P. Kato, Josiane Mothe, and Barbara Poblete (eds.), Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023, Taipei, Taiwan, July 23-27, 2023, pp. 1023–1034. ACM, 2023. doi: 10.1145/3539618.3591731. URL https://doi.org/10.1145/3539618.3591731.
- Personalizing intervened network for long-tailed sequential user behavior modeling. arXiv preprint arXiv:2208.09130, 2022.
- Duet: A tuning-free device-cloud collaborative parameters generation framework for efficient device model generalization. In Proceedings of the ACM Web Conference 2023, pp. 3077–3085, 2023.
- Intelligent model update strategy for sequential recommendation. In Proceedings of the ACM Web Conference 2024, 2024.
- Shufflenet V2: practical guidelines for efficient CNN architecture design. In Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (eds.), Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XIV, volume 11218 of Lecture Notes in Computer Science, pp. 122–138. Springer, 2018. doi: 10.1007/978-3-030-01264-9_8. URL https://doi.org/10.1007/978-3-030-01264-9_8.
- Zero-shot knowledge transfer via adversarial belief matching. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett (eds.), Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 9547–9557, 2019. URL https://proceedings.neurips.cc/paper/2019/hash/fe663a72b27bdc613873fbbb512f6f67-Abstract.html.
- Respecting transfer gap in knowledge distillation. In NeurIPS, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/89b0e466b46292ce0bfe53618aadd3de-Abstract-Conference.html.
- Relational knowledge distillation. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 3967–3976. Computer Vision Foundation / IEEE, 2019. doi: 10.1109/CVPR.2019.00409. URL http://openaccess.thecvf.com/content_CVPR_2019/html/Park_Relational_Knowledge_Distillation_CVPR_2019_paper.html.
- Learning to retain while acquiring: Combating distribution-shift in adversarial data-free knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7786–7794, June 2023.
- Uncertainty-induced transferability representation for source-free unsupervised domain adaptation. IEEE Trans. Image Process., 32:2033–2048, 2023. doi: 10.1109/TIP.2023.3258753. URL https://doi.org/10.1109/TIP.2023.3258753.
- Visda: The visual domain adaptation challenge. CoRR, abs/1710.06924, 2017. URL http://arxiv.org/abs/1710.06924.
- Intelligent request strategy design in recommender system. In KDD ’22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 3772–3782. ACM, 2022.
- Learning transferable visual models from natural language supervision. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pp. 8748–8763. PMLR, 2021. URL http://proceedings.mlr.press/v139/radford21a.html.
- Fitnets: Hints for thin deep nets. In Yoshua Bengio and Yann LeCun (eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL http://arxiv.org/abs/1412.6550.
- Uncertainty-guided source-free domain adaptation. In Shai Avidan, Gabriel J. Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner (eds.), Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXV, volume 13685 of Lecture Notes in Computer Science, pp. 537–555. Springer, 2022. doi: 10.1007/978-3-031-19806-9_31. URL https://doi.org/10.1007/978-3-031-19806-9_31.
- Adapting visual category models to new domains. In Kostas Daniilidis, Petros Maragos, and Nikos Paragios (eds.), Computer Vision - ECCV 2010, 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part IV, volume 6314 of Lecture Notes in Computer Science, pp. 213–226. Springer, 2010. doi: 10.1007/978-3-642-15561-1_16. URL https://doi.org/10.1007/978-3-642-15561-1_16.
- Distributionally robust neural networks. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. URL https://openreview.net/forum?id=ryxGuJrFvS.
- Efficientnet: Rethinking model scaling for convolutional neural networks. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pp. 6105–6114. PMLR, 2019. URL http://proceedings.mlr.press/v97/tan19a.html.
- Modelgpt: Unleashing llm’s capabilities for tailored model generation, 2024.
- Data-free model extraction. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pp. 4771–4780. Computer Vision Foundation / IEEE, 2021. doi: 10.1109/CVPR46437.2021.00474. URL https://openaccess.thecvf.com/content/CVPR2021/html/Truong_Data-Free_Model_Extraction_CVPR_2021_paper.html.
- Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(86):2579–2605, 2008. URL http://jmlr.org/papers/v9/vandermaaten08a.html.
- Deep hashing network for unsupervised domain adaptation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 5385–5394. IEEE Computer Society, 2017. doi: 10.1109/CVPR.2017.572. URL https://doi.org/10.1109/CVPR.2017.572.
- Community preserving network embedding. In Satinder Singh and Shaul Markovitch (eds.), Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA, pp. 203–209. AAAI Press, 2017. doi: 10.1609/AAAI.V31I1.10488. URL https://doi.org/10.1609/aaai.v31i1.10488.
- Dynamic curriculum learning for imbalanced data classification. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pp. 5016–5025. IEEE, 2019. doi: 10.1109/ICCV.2019.00512. URL https://doi.org/10.1109/ICCV.2019.00512.
- Model robustness meets data privacy: Adversarial robustness distillation without original data. CoRR, abs/2303.11611, 2023a. doi: 10.48550/arXiv.2303.11611. URL https://doi.org/10.48550/arXiv.2303.11611.
- Sampling to distill: Knowledge transfer from open-world data. CoRR, abs/2307.16601, 2023b. doi: 10.48550/ARXIV.2307.16601. URL https://doi.org/10.48550/arXiv.2307.16601.
- Interact before align: Leveraging cross-modal knowledge for domain adaptive action recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pp. 14702–14712. IEEE, 2022. doi: 10.1109/CVPR52688.2022.01431. URL https://doi.org/10.1109/CVPR52688.2022.01431.
- Exploiting the intrinsic neighborhood structure for source-free domain adaptation. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (eds.), Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pp. 29393–29405, 2021. URL https://proceedings.neurips.cc/paper/2021/hash/f5deaeeae1538fb6c45901d524ee2f98-Abstract.html.
- Fedack: Federated adversarial contrastive knowledge distillation for cross-lingual and cross-model social bot detection. In Ying Ding, Jie Tang, Juan F. Sequeda, Lora Aroyo, Carlos Castillo, and Geert-Jan Houben (eds.), Proceedings of the ACM Web Conference 2023, WWW 2023, Austin, TX, USA, 30 April 2023 - 4 May 2023, pp. 1314–1323. ACM, 2023. doi: 10.1145/3543507.3583500. URL https://doi.org/10.1145/3543507.3583500.
- Dreaming to distill: Data-free knowledge transfer via deepinversion. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 8712–8721. Computer Vision Foundation / IEEE, 2020. doi: 10.1109/CVPR42600.2020.00874. URL https://openaccess.thecvf.com/content_CVPR_2020/html/Yin_Dreaming_to_Distill_Data-Free_Knowledge_Transfer_via_DeepInversion_CVPR_2020_paper.html.
- Tree structure-aware few-shot image classification via hierarchical aggregation. In Proceeding of the 17th European Conference on Computer Vision, ECCV, 2022.
- Map: Towards balanced generalization of iid and ood through model-agnostic adapters. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11921–11931, 2023a.
- Metacoco: A new few-shot classification benchmark with spurious correlation. In Proceedings of the Twelfth International Conference on Learning Representations (ICLR 2024)., 2024a.
- Devlbert: Learning deconfounded visio-linguistic representations. In MM ’20: The 28th ACM International Conference on Multimedia, pp. 4373–4382. ACM, 2020.
- Personalized latent structure learning for recommendation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023b.
- Transferring causal mechanism over meta-representations for target-unknown cross-domain recommendation. ACM Trans. Inf. Syst., 2024b. doi: 10.1145/3643807. URL https://doi.org/10.1145/3643807. Just Accepted.
- Revisiting the domain shift and sample uncertainty in multi-source active domain transfer. arXiv preprint arXiv:2311.12905, 2023c.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.