- The paper presents SAHI, a novel slicing-aided inference and fine-tuning framework that significantly boosts small object detection performance.
- It employs a method of slicing high-resolution images into overlapping patches, enlarging small object details for better feature extraction.
- Experimental results on Visdrone and xView datasets demonstrate AP improvements up to 14.5% across detectors like FCOS, VFNet, and TOOD.
Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection
The detection of small objects in high-resolution images poses a significant challenge, particularly in surveillance applications. Small objects, represented by a minimal number of pixels, often lack sufficient detail, complicating their detection by conventional techniques. This paper presents a novel framework—Slicing Aided Hyper Inference (SAHI)—which effectively addresses the issue of small object detection. The proposed method facilitates a generic slicing-aided inference and fine-tuning pipeline applicable to any existing object detector without pretraining.
Core Methodology
The SAHI framework functions by slicing high-resolution images into overlapping patches during both the fine-tuning and inference stages. This approach ensures that small objects occupy a relatively larger pixel area, enhancing their detectability. Fine-tuning involves augmenting the training dataset with patches extracted from original images, resized appropriately to improve the feature extraction process for smaller objects.
During inference, the original image is divided into patches, and detection is conducted on these resized patches. Optional full inference on the entire image can be merged with patch-based predictions using Non-Maximum Suppression (NMS) to improve outcomes further. This dual-stage process allows the SAHI framework to seamlessly integrate with existing object detectors, boosting their performance on small objects.
Experimental Outcomes
The efficacy of the SAHI framework is demonstrated through experiments on the Visdrone and xView datasets, employing FCOS, VFNet, and TOOD object detectors. The incorporation of the SAHI framework without any fine-tuning led to significant improvements in AP by 6.8%, 5.1%, and 5.3% for FCOS, VFNet, and TOOD, respectively. Further cumulative AP increases of 12.7%, 13.4%, and 14.5% were observed with the application of slicing-aided fine-tuning.
The experiments underline the framework's ability to improve detection performance for various object sizes without incurring additional memory requirements. Hence, SAHI offers a computationally efficient solution particularly suitable for high-resolution, memory-constrained environments.
Theoretical and Practical Implications
SAHI's contribution lies in its ability to adapt any existing object detection framework for superior performance with minimal adjustments. This addresses a critical gap in the detection of small objects in diverse applications, including aerial surveillance and satellite imaging. The slicing approach effectively balances memory use with computational demands, adjustable via hyper-parameters such as patch size.
Future Directions
The paper suggests potential expansions of the framework to instance segmentation models. Further exploration into different post-processing techniques could enhance performance and make this methodology more robust across various contexts.
In conclusion, the proposed SAHI framework offers a promising advancement in small object detection, providing substantial improvements to detection accuracy through an elegantly simple and adaptable approach. It holds significant potential for enhancing object detection frameworks currently in use, paving the way for future developments in the field.