Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection

Published 14 Feb 2022 in cs.CV and cs.LG | (2202.06934v5)

Abstract: Detection of small objects and objects far away in the scene is a major challenge in surveillance applications. Such objects are represented by small number of pixels in the image and lack sufficient details, making them difficult to detect using conventional detectors. In this work, an open-source framework called Slicing Aided Hyper Inference (SAHI) is proposed that provides a generic slicing aided inference and fine-tuning pipeline for small object detection. The proposed technique is generic in the sense that it can be applied on top of any available object detector without any fine-tuning. Experimental evaluations, using object detection baselines on the Visdrone and xView aerial object detection datasets show that the proposed inference method can increase object detection AP by 6.8%, 5.1% and 5.3% for FCOS, VFNet and TOOD detectors, respectively. Moreover, the detection accuracy can be further increased with a slicing aided fine-tuning, resulting in a cumulative increase of 12.7%, 13.4% and 14.5% AP in the same order. Proposed technique has been integrated with Detectron2, MMDetection and YOLOv5 models and it is publicly available at https://github.com/obss/sahi.git .

Abstract PDF Upgrade to Chat

Authors (3)

Citations (155)

View on Semantic Scholar

Summary

The paper presents SAHI, a novel slicing-aided inference and fine-tuning framework that significantly boosts small object detection performance.
It employs a method of slicing high-resolution images into overlapping patches, enlarging small object details for better feature extraction.
Experimental results on Visdrone and xView datasets demonstrate AP improvements up to 14.5% across detectors like FCOS, VFNet, and TOOD.

Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection

The detection of small objects in high-resolution images poses a significant challenge, particularly in surveillance applications. Small objects, represented by a minimal number of pixels, often lack sufficient detail, complicating their detection by conventional techniques. This paper presents a novel framework—Slicing Aided Hyper Inference (SAHI)—which effectively addresses the issue of small object detection. The proposed method facilitates a generic slicing-aided inference and fine-tuning pipeline applicable to any existing object detector without pretraining.

Core Methodology

The SAHI framework functions by slicing high-resolution images into overlapping patches during both the fine-tuning and inference stages. This approach ensures that small objects occupy a relatively larger pixel area, enhancing their detectability. Fine-tuning involves augmenting the training dataset with patches extracted from original images, resized appropriately to improve the feature extraction process for smaller objects.

During inference, the original image is divided into patches, and detection is conducted on these resized patches. Optional full inference on the entire image can be merged with patch-based predictions using Non-Maximum Suppression (NMS) to improve outcomes further. This dual-stage process allows the SAHI framework to seamlessly integrate with existing object detectors, boosting their performance on small objects.

Experimental Outcomes

The efficacy of the SAHI framework is demonstrated through experiments on the Visdrone and xView datasets, employing FCOS, VFNet, and TOOD object detectors. The incorporation of the SAHI framework without any fine-tuning led to significant improvements in AP by 6.8%, 5.1%, and 5.3% for FCOS, VFNet, and TOOD, respectively. Further cumulative AP increases of 12.7%, 13.4%, and 14.5% were observed with the application of slicing-aided fine-tuning.

The experiments underline the framework's ability to improve detection performance for various object sizes without incurring additional memory requirements. Hence, SAHI offers a computationally efficient solution particularly suitable for high-resolution, memory-constrained environments.

Theoretical and Practical Implications

SAHI's contribution lies in its ability to adapt any existing object detection framework for superior performance with minimal adjustments. This addresses a critical gap in the detection of small objects in diverse applications, including aerial surveillance and satellite imaging. The slicing approach effectively balances memory use with computational demands, adjustable via hyper-parameters such as patch size.

Future Directions

The paper suggests potential expansions of the framework to instance segmentation models. Further exploration into different post-processing techniques could enhance performance and make this methodology more robust across various contexts.

In conclusion, the proposed SAHI framework offers a promising advancement in small object detection, providing substantial improvements to detection accuracy through an elegantly simple and adaptable approach. It holds significant potential for enhancing object detection frameworks currently in use, paving the way for future developments in the field.

Markdown Report Issue