Spatial Self-Distillation for Object Detection with Inaccurate Bounding Boxes (2307.12101v2)

Published 22 Jul 2023 in cs.CV

Abstract: Object detection via inaccurate bounding boxes supervision has boosted a broad interest due to the expensive high-quality annotation data or the occasional inevitability of low annotation quality (\eg tiny objects). The previous works usually utilize multiple instance learning (MIL), which highly depends on category information, to select and refine a low-quality box. Those methods suffer from object drift, group prediction and part domination problems without exploring spatial information. In this paper, we heuristically propose a \textbf{Spatial Self-Distillation based Object Detector (SSD-Det)} to mine spatial information to refine the inaccurate box in a self-distillation fashion. SSD-Det utilizes a Spatial Position Self-Distillation \textbf{(SPSD)} module to exploit spatial information and an interactive structure to combine spatial information and category information, thus constructing a high-quality proposal bag. To further improve the selection procedure, a Spatial Identity Self-Distillation \textbf{(SISD)} module is introduced in SSD-Det to obtain spatial confidence to help select the best proposals. Experiments on MS-COCO and VOC datasets with noisy box annotation verify our method's effectiveness and achieve state-of-the-art performance. The code is available at https://github.com/ucas-vg/PointTinyBenchmark/tree/SSD-Det.

Authors (6)

Di Wu (477 papers)
Pengfei Chen (52 papers)
Xuehui Yu (23 papers)
Guorong Li (36 papers)
Zhenjun Han (29 papers)
Jianbin Jiao (51 papers)

Citations (4)

View on Semantic Scholar

Summary

Spatial Self-Distillation for Object Detection with Inaccurate Bounding Boxes

The paper presents a novel approach, termed Spatial Self-Distillation based Object Detector (SSD-Det), addressing the challenge of object detection using inaccurate bounding box annotations. This issue is increasingly relevant due to the high costs and complexities of generating precise annotations in large datasets such as MS-COCO and VOC.

Key Contributions

Integration of Spatial Information: The paper introduces the Spatial Position Self-Distillation (SPSD) and Spatial Identity Self-Distillation (SISD) modules. These modules effectively leverage spatial information, which previous methods mainly reliant on multiple instance learning (MIL) have overlooked.
Interactive Structure: By combining spatial and category information in a unified framework, the SSD-Det showcases an innovative interaction between the SPSD and the MIL approach, enhancing the proposal bag construction quality.
Improved Proposal Selection: With the SISD module, spatial confidence is integrated into the proposal selection process, addressing issues like object drift, group prediction, and part domination, which are prevalent in MIL-based methods.

Experimental Results

The paper reports state-of-the-art performance on both MS-COCO and VOC datasets. Under conditions of high annotation noise, SSD-Det demonstrates robust improvements over established techniques such as OA-MIL. For instance, in high noise settings (40% box noise), the SSD-Det improved the mean average precision (mAP) over prior best methods by substantial margins (e.g., from 18.6% to 27.6% on MS-COCO).

Theoretical and Practical Implications

The theoretical underpinning of the research lies in its ability to effectively distill and integrate spatial information from available noisy annotations, which not only enhances the robustness of the detection models but also improves their adaptability to different noise levels. Practically, the approach reduces the dependency on high-quality annotations, thus lowering data annotation costs and duration. This has significant implications for industries relying heavily on data-driven detections, such as autonomous vehicles, agricultural monitoring, and medical diagnostics.

Speculation on Future Developments

The success of SSD-Det opens up avenues for further exploration into:

Broader Scope of Annotations: Extending the methodology to different types of annotations such as partial or occluded labels.
Cross-domain Applications: Adapting the framework for use in varied environmental conditions or across different datasets.
Real-time Detection: Investigating the potential of SSD-Det in real-time systems by enhancing computational efficiency.

This paper's contribution lies not only in the methodology but also in its emphasis on the utility of inaccurate data, aligning with a broader trend towards achieving more with less in the field of AI and machine learning.

PDF Markdown