Focal and Global Knowledge Distillation for Detectors (2111.11837v2)

Published 23 Nov 2021 in cs.CV

Abstract: Knowledge distillation has been applied to image classification successfully. However, object detection is much more sophisticated and most knowledge distillation methods have failed on it. In this paper, we point out that in object detection, the features of the teacher and student vary greatly in different areas, especially in the foreground and background. If we distill them equally, the uneven differences between feature maps will negatively affect the distillation. Thus, we propose Focal and Global Distillation (FGD). Focal distillation separates the foreground and background, forcing the student to focus on the teacher's critical pixels and channels. Global distillation rebuilds the relation between different pixels and transfers it from teachers to students, compensating for missing global information in focal distillation. As our method only needs to calculate the loss on the feature map, FGD can be applied to various detectors. We experiment on various detectors with different backbones and the results show that the student detector achieves excellent mAP improvement. For example, ResNet-50 based RetinaNet, Faster RCNN, RepPoints and Mask RCNN with our distillation method achieve 40.7%, 42.0%, 42.0% and 42.1% mAP on COCO2017, which are 3.3, 3.6, 3.4 and 2.9 higher than the baseline, respectively. Our codes are available at https://github.com/yzd-v/FGD.

Citations (216)

View on Semantic Scholar

Summary

The paper presents a bifurcated approach by introducing Focal Distillation to focus on key foreground features using attention masks.
The paper employs Global Distillation to reconstruct relational spatial features, yielding improved mAP across multiple detection frameworks.
The approach mitigates foreground-background imbalances, setting a foundation for enhanced multi-scale and cross-domain feature distillation research.

Focal and Global Knowledge Distillation for Detectors

The paper "Focal and Global Knowledge Distillation for Detectors" introduces a novel approach to enhance the performance of object detection models through an advanced method of knowledge distillation. While knowledge distillation has shown efficacy in image classification, its application to object detection has faced challenges due to the complexity of the task. This paper presents Focal and Global Distillation (FGD) as a solution, providing insight into how teacher and student models process features differently in foreground and background contexts.

Key Contributions

The research introduces a bifurcated approach to knowledge distillation in object detection:

Focal Distillation: By separating foreground from background, FGD allows the student model to prioritize crucial areas highlighted by the teacher model. This involves using attention masks to focus on important pixels and channels—effectively reducing the negative impact of feature disparity between the teacher and student models.
Global Distillation: This component compensates for any missing global context omitted by focal distillation. It reconstructs the relational features between pixels, thus transferring comprehensive spatial information from teacher to student.

Experimental Results

This method was thoroughly tested across various detectors, including ResNet-50 based RetinaNet, Faster RCNN, RepPoints, and Mask RCNN. FGD consistently demonstrated improved mean Average Precision (mAP) scores—3.3%, 3.6%, 3.4%, and 2.9% higher than baseline, respectively. These results highlight FGD’s strength in enhancing object detection across diverse backbones and model architectures.

Analysis and Implications

The findings underscore the crucial need to address foreground-background imbalances when applying knowledge distillation in object detection. The separation and differential weighting of these areas are suggested to play a pivotal role in mitigating distillation inefficiencies. Furthermore, by capturing and distilling relational dependencies among pixels, FGD extends beyond traditional linear distillation approaches, providing richer information for model training.

Practical and Theoretical Implications

Practically, FGD is adaptable across various detection frameworks, as its loss calculations rely solely on feature maps rather than model-specific attributes. Theoretically, the paper offers a fresh perspective on understanding attention mechanisms within the distillation process, potentially influencing future research on hierarchical or multi-modal models.

Future Directions

The work sets the stage for further exploration into the intricacies of distilling relational knowledge and its impacts on model performance under different contexts. Future research could involve extending these methods to other domains where multi-scale feature interplay and relational understanding are critical, such as video analysis or cross-domain transfer learning.

In summary, the "Focal and Global Knowledge Distillation for Detectors" paper presents a well-founded advancement in the field of knowledge distillation for object detection, providing robust empirical results and introducing meaningful directions for future exploration.

PDF Markdown

Related Papers

GitHub

GitHub - yzd-v/FGD: Focal and Global Knowledge Distillation for Detectors (CVPR 2022) (351 stars)