Distilling Object Detectors with Fine-grained Feature Imitation (1906.03609v1)

Published 9 Jun 2019 in cs.CV, cs.AI, and cs.LG

Abstract: State-of-the-art CNN based recognition models are often computationally prohibitive to deploy on low-end devices. A promising high level approach tackling this limitation is knowledge distillation, which let small student model mimic cumbersome teacher model's output to get improved generalization. However, related methods mainly focus on simple task of classification while do not consider complex tasks like object detection. We show applying the vanilla knowledge distillation to detection model gets minor gain. To address the challenge of distilling knowledge in detection model, we propose a fine-grained feature imitation method exploiting the cross-location discrepancy of feature response. Our intuition is that detectors care more about local near object regions. Thus the discrepancy of feature response on the near object anchor locations reveals important information of how teacher model tends to generalize. We design a novel mechanism to estimate those locations and let student model imitate the teacher on them to get enhanced performance. We first validate the idea on a developed lightweight toy detector which carries simplest notion of current state-of-the-art anchor based detection models on challenging KITTI dataset, our method generates up to 15% boost of mAP for the student model compared to the non-imitated counterpart. We then extensively evaluate the method with Faster R-CNN model under various scenarios with common object detection benchmark of Pascal VOC and COCO, imitation alleviates up to 74% performance drop of student model compared to teacher. Codes released at https://github.com/twangnh/Distilling-Object-Detectors

Authors (4)

Tao Wang (700 papers)
Li Yuan (141 papers)
Xiaopeng Zhang (100 papers)
Jiashi Feng (295 papers)

Citations (356)

View on Semantic Scholar

Summary

Distilling Object Detectors with Fine-grained Feature Imitation

The paper "Distilling Object Detectors with Fine-grained Feature Imitation" presents an insightful approach to enhance the efficiency of CNN-based object detection models, which are typically computationally intensive, by employing a fine-grained feature imitation method for knowledge distillation. The authors recognize the challenge of deploying these models on low-end devices and propose a solution that specifically improves object detection tasks, as opposed to the more common focus on image classification in existing knowledge distillation techniques.

The primary contribution of this research lies in its innovative approach to distilling knowledge in object detection models. Traditional techniques primarily target classification tasks and do not extend well to the more complex task of object detection, where reliable localization is crucial, and the imbalance between foreground and background instances presents additional challenges. The paper demonstrates that applying conventional knowledge distillation to detection models yields only marginal improvements. Hence, the authors introduce a novel mechanism that leverages the cross-location discrepancy of feature responses to refine the imitation process. By identifying and focusing on near-object anchor locations, the student model is guided to mimic the teacher model's behavior more effectively.

The core principle of the method involves calculating a mask to identify these critical locations, using ground truth bounding boxes and anchor priors to produce a fine-grained imitation region on the feature map. The experiments underscore the efficacy of this technique, achieving up to a 15% boost in mAP for student models over non-imitated counterparts in trials with the KITTI dataset, and a significant reduction in performance drop for student models compared to teacher models on Pascal VOC and COCO benchmarks. Notably, the fine-grained feature imitation before classification and localization heads enhances both classification and localization sub-tasks, as validated through qualitative and quantitative analyses.

Moreover, the research acknowledges the limitations of existing methods such as full feature imitation or vanilla distillation, which either introduce performance-degrading noise from irrelevant areas or fail to capture necessary localization knowledge across different model configurations. The authors propose a feature adaptation layer to align student and teacher model responses, facilitating the distillation process and improving the generalization capabilities of the student models without introducing substantial computational overhead.

Theoretical implications of this work highlight the nuanced understanding required to efficiently distill knowledge in detection models, emphasizing the importance of selective feature imitation. Practically, this method offers a scalable approach to optimize object detection models for devices with limited computational resources, enabling broader deployment across various hardware configurations.

In terms of future directions, this fine-grained feature imitation approach could be explored further to adapt across a wider range of network architectures and detection scenarios, including multi-stage or end-to-end detection pipelines. Additionally, combining this method with complementary acceleration techniques like network pruning and quantization opens avenues for comprehensive model optimization frameworks applicable to diverse AI applications. By focusing on the interplay between local and global feature understanding, this research may contribute to refining the efficiency and application scope for object detection algorithms in AI and machine learning domains.

PDF Markdown

Related Papers

Find Related Papers