General Instance Distillation for Object Detection (2103.02340v2)

Published 3 Mar 2021 in cs.CV

Abstract: In recent years, knowledge distillation has been proved to be an effective solution for model compression. This approach can make lightweight student models acquire the knowledge extracted from cumbersome teacher models. However, previous distillation methods of detection have weak generalization for different detection frameworks and rely heavily on ground truth (GT), ignoring the valuable relation information between instances. Thus, we propose a novel distillation method for detection tasks based on discriminative instances without considering the positive or negative distinguished by GT, which is called general instance distillation (GID). Our approach contains a general instance selection module (GISM) to make full use of feature-based, relation-based and response-based knowledge for distillation. Extensive results demonstrate that the student model achieves significant AP improvement and even outperforms the teacher in various detection frameworks. Specifically, RetinaNet with ResNet-50 achieves 39.1% in mAP with GID on COCO dataset, which surpasses the baseline 36.2% by 2.9%, and even better than the ResNet-101 based teacher model with 38.1% AP.

Citations (178)

View on Semantic Scholar

Summary

The paper introduces General Instance Distillation (GID) to enhance object detection by transferring detailed feature information from a teacher to a student model.
It employs standard ResNet architectures on COCO and Pascal VOC datasets, achieving performance gains such as fewer false positives and better localization.
The study’s implementation of instance-level distillation offers promising prospects for creating more efficient detection models suitable for resource-constrained environments.

General Instance Distillation for Object Detection

The paper under discussion presents a thorough investigation into the implementation of General Instance Distillation (GID) within the context of object detection. Utilizing a private codebase built on PyTorch optimized for detection tasks, the researchers conducted their experiments with parallel acceleration on NVIDIA GeForce RTX 2080Ti 8 GPUs. This setup underscores the computational intensity and advanced approaches employed in modern object detection research.

Experimental Setup and Training Configuration

The authors leveraged standard neural network architectures, specifically ResNet-50 and ResNet-101, pre-trained on ImageNet as the foundational backbone for their detection models. They followed established training procedures for object detection tasks using both COCO and Pascal VOC datasets. The COCO dataset models underwent extensive training for 24 epochs with a schedule that reduces the learning rate at specific iteration milestones, while the VOC dataset models were trained for 17.4 epochs with corresponding learning rate adjustments. These training regimens satiate the current best practices observed for these datasets, ensuring maximum performance and reliability in the conducted experiments.

Distillation Process and Configuration

Central to the paper is the distillation configuration built upon various hyperparameters, including convolutional layers structured as 3x3 kernels applying the ROIAlign algorithm for feature extraction. The research applied the GID method in a multi-stage detection architecture—commonly referred to in literature as the teacher-student paradigm—demonstrating improved detection capabilities in the student models. The empirical evidence indicated tangible performance gains manifesting in less false positive samples, fewer missing detections, and more precise localization of objects. The distilled RetinaNet-Res101-50 model achieved a significant mAP improvement over the baseline model, endorsing the efficacy of the distillation approach.

Implications and Future Prospects

The insights presented have profound implications for both theoretical advancements and practical applications in AI-driven object detection. The adaptation and successful implementation of instance-level distillation techniques showcase potential avenues for optimizing detectors through feature imitation and relation-based knowledge transfer. The qualitative visualizations further highlight the enhanced efficiency and accuracy achievable when employing distillation methods.

The research could catalyze future developments in neural model compression, contributing to the creation of more efficient algorithms capable of performing on resource-constrained devices or environments, potentially reshaping how AI applications are deployed in real-world scenarios. Given the nature of the advancements demonstrated, future exploration might focus on integrating GID techniques into other domains of computer vision, broadening the applicability and impact of distillation methodologies.

Overall, this paper provides a tactful execution and evaluation of GID in object detection, revealing promising avenues for refining model performance while maintaining efficiency, which may crucially benefit the ongoing developments in AI research.

PDF Markdown