Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CrossKD: Cross-Head Knowledge Distillation for Object Detection (2306.11369v2)

Published 20 Jun 2023 in cs.CV

Abstract: Knowledge Distillation (KD) has been validated as an effective model compression technique for learning compact object detectors. Existing state-of-the-art KD methods for object detection are mostly based on feature imitation. In this paper, we present a general and effective prediction mimicking distillation scheme, called CrossKD, which delivers the intermediate features of the student's detection head to the teacher's detection head. The resulting cross-head predictions are then forced to mimic the teacher's predictions. This manner relieves the student's head from receiving contradictory supervision signals from the annotations and the teacher's predictions, greatly improving the student's detection performance. Moreover, as mimicking the teacher's predictions is the target of KD, CrossKD offers more task-oriented information in contrast with feature imitation. On MS COCO, with only prediction mimicking losses applied, our CrossKD boosts the average precision of GFL ResNet-50 with 1x training schedule from 40.2 to 43.7, outperforming all existing KD methods. In addition, our method also works well when distilling detectors with heterogeneous backbones. Code is available at https://github.com/jbwang1997/CrossKD.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (76)
  1. Few shot network compression via cross distillation. In Proceedings of the AAAI Conference on Artificial Intelligence, 2020.
  2. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934, 2020.
  3. Domain-controlled prompt learning. arXiv preprint arXiv:2310.07730, 2023.
  4. Domain prompt learning with quaternion networks. arXiv preprint arXiv:2312.08878, 2023.
  5. Pkd: General distillation framework for object detectors via pearson correlation coefficient. In Advances in Neural Information Processing Systems, 2022.
  6. End-to-end object detection with transformers. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pages 213–229. Springer, 2020.
  7. Learning efficient object detection models with knowledge distillation. Advances in neural information processing systems, 30, 2017.
  8. Hybrid task cascade for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  9. MMDetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155, 2019.
  10. Tinydet: accurately detecting small objects within 1 gflops. Science China Information Sciences, 66(1):119102, 2023.
  11. Yolo-ms: Rethinking multi-scale representation learning for real-time object detection, 2023.
  12. On the efficacy of knowledge distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
  13. General instance distillation for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7842–7851, June 2021.
  14. Structural knowledge distillation for object detection. In Advances in Neural Information Processing Systems, 2022.
  15. Tood: Task-aligned one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3510–3519, October 2021.
  16. Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430, 2021.
  17. Ross Girshick. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), December 2015.
  18. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2014.
  19. Distilling object detectors via decoupled features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2154–2164, June 2021.
  20. Positive-unlabeled data purification in the wild for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2653–2662, 2021.
  21. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017.
  22. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
  23. A comprehensive overhaul of feature distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
  24. Distilling the knowledge in a neural network (2015). arXiv preprint arXiv:1503.02531, 2, 2015.
  25. Mssd: multi-scale self-distillation for object detection. Visual Intelligence, 2(1):8, 2024.
  26. Paraphrasing complex network: Network compression via factor transfer. Advances in neural information processing systems, 31, 2018.
  27. Foveabox: Beyound anchor-based object detection. IEEE Transactions on Image Processing, 29:7389–7398, 2020.
  28. LIT: Learned intermediate representation training for model compression. In International Conference on Learning Representations, 2019.
  29. Arm3d: Attention-based relation module for indoor 3d object detection. Computational Visual Media, 8(3):395–414, 2022.
  30. Dn-detr: Accelerate detr training by introducing query denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13619–13627, June 2022.
  31. Knowledge distillation for object detection via rank mimicking and prediction-guided feature imitation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1306–1313, 2022.
  32. Residual distillation: Towards portable deep neural networks without shortcuts. Advances in Neural Information Processing Systems, 33:8935–8946, 2020.
  33. Mimicking very efficient network for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
  34. Generalized focal loss v2: Learning reliable localization quality estimation for dense object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11632–11641, June 2021.
  35. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 21002–21012. Curran Associates, Inc., 2020.
  36. Curriculum temperature for knowledge distillation. In Proceedings of the AAAI Conference on Artificial Intelligence, number 2, pages 1504–1512, 2023.
  37. Online knowledge distillation for efficient pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11740–11750, 2021.
  38. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
  39. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017.
  40. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
  41. Function-consistent feature distillation. In International Conference on Learning Representations, 2019.
  42. DAB-DETR: Dynamic anchor boxes are better queries for DETR. In International Conference on Learning Representations, 2022.
  43. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10012–10022, October 2021.
  44. Rtmdet: An empirical study of designing real-time object detectors. arXiv preprint arXiv:2212.07784, 2022.
  45. Conditional detr for fast training convergence. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3651–3660, October 2021.
  46. Improved knowledge distillation via teacher assistant. In Proceedings of the AAAI conference on artificial intelligence, pages 5191–5198, 2020.
  47. Improving object detection by label assignment distillation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 1005–1014, January 2022.
  48. Relational knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  49. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
  50. Yolo9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
  51. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.
  52. Faster r-cnn: Towards real-time object detection with region proposal networks. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015.
  53. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  54. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550, 2014.
  55. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  56. Dense face network: A dense face detector based on global context and visual attention mechanism. Machine Intelligence Research, 19(3):247–256, 2022.
  57. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
  58. Region proposal by guided anchoring. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  59. Head: Hetero-assists distillation for heterogeneous object detectors. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IX, pages 314–331. Springer, 2022.
  60. Mvcontrast: Unsupervised pretraining for multi-view 3d object recognition. Machine Intelligence Research, 20(6):872–883, 2023.
  61. Distilling object detectors with fine-grained feature imitation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  62. Snapshot distillation: Teacher-student optimization in one generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  63. Knowledge distillation via softmax regression representation learning. In International Conference on Learning Representations, 2021.
  64. Focal and global knowledge distillation for detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4643–4652, June 2022.
  65. G-detkd: Towards general distillation framework for object detectors via contrastive and semantic-guided feature imitation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3591–3600, October 2021.
  66. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
  67. DINO: DETR with improved denoising anchor boxes for end-to-end object detection. In The Eleventh International Conference on Learning Representations, 2023.
  68. Improve object detection with feature-based knowledge distillation: Towards accurate and efficient detectors. In International Conference on Learning Representations, 2021.
  69. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  70. Deep mutual learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  71. Decoupled knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11953–11962, June 2022.
  72. Dynamic tuning towards parameter and inference efficiency for vit adaptation, 2024.
  73. Localization distillation for dense object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9407–9416, June 2022.
  74. Distilling object detectors with feature richness. Advances in Neural Information Processing Systems, 34:5213–5224, 2021.
  75. Feature selective anchor-free module for single-shot object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  76. Deformable {detr}: Deformable transformers for end-to-end object detection. In International Conference on Learning Representations, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jiabao Wang (24 papers)
  2. Yuming Chen (22 papers)
  3. Zhaohui Zheng (12 papers)
  4. Xiang Li (1003 papers)
  5. Ming-Ming Cheng (185 papers)
  6. Qibin Hou (82 papers)
Citations (11)

Summary

An Analysis of "CrossKD: Cross-Head Knowledge Distillation for Object Detection"

The paper "CrossKD: Cross-Head Knowledge Distillation for Object Detection" introduces a novel technique aimed at improving knowledge distillation (KD) for object detectors through a framework termed Cross-Head Knowledge Distillation (CrossKD). The concept of KD, widely recognized for model compression in deep learning, translates the knowledge from a large "teacher" model to a smaller "student" model, enhancing the latter's performance while retaining efficiency in computation. This paper targets the specific challenges posed by knowledge distillation in object detection, particularly addressing the known issue of target conflict which arises during the training process between ground-truth targets and teacher predictions.

Framework and Methodology

The CrossKD framework diverges from traditional prediction mimicking paradigms, which often confront target conflict due to discrepancies between the student's annotations and the teacher’s predictions. In contrast, CrossKD alleviates these conflicts by delivering the intermediate features of the student’s detection head to the teacher’s detection head, thereby generating what they call "cross-head predictions." The distillation loss is then computed between these cross-head predictions and the original predictions of the teacher.

This cross-head approach effectively ensures that the supervision signals received by the student's head are less contradictory, leading to a more stable learning process. The paper attests to the empirical efficacy of this method by achieving superior performance on the MS COCO dataset, showcasing an increase in average precision from 40.2 to 43.7 for GFL ResNet-50 models using a 1× training schedule, outperforming existing KD techniques.

Results and Contributions

The experimentation section of the paper discusses multiple configurations where CrossKD demonstrated consistent improvements over existing KD techniques. Notably, when applied to GFL models using various backbones, CrossKD not only improved model performance significantly but also proved to be effective across heterogeneous backbone architectures. The distillation tactic is intrinsic to CrossKD's design, catering to the specific nuances of dense object detectors and highlighting its task-oriented philosophy compared to feature imitation methods.

Implications and Future Directions

The findings implicate a notable advancement in object detection efficiency through structured KD pathways like CrossKD, suggesting possible avenues for further research. The benefits of reduced target conflict suggest potential applications in scenarios where model robustness and reliability are critical, such as autonomous driving or real-time surveillance. Given the promising results, future work could explore expanding CrossKD methods to more complex architectures or integrating them with unique feature extraction techniques to further minimize discrepancies and enhance performance.

Moreover, potential adaptations of CrossKD in broader machine learning tasks beyond object detection could be a compelling direction, exploring the limits of knowledge transfer in more dynamic settings. As deep learning systems strive for computational efficiency without sacrificing accuracy, CrossKD provides a pivotal step towards reconciling these often competing objectives.

In summary, the paper provides a compelling case for re-evaluating how knowledge distillation is approached within object detection frameworks. The empirical evidence and fresh insight into mitigating target conflict present a meaningful contribution to both theoretical understanding and practical implementations in AI-driven object detection systems.

Github Logo Streamline Icon: https://streamlinehq.com