Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

IoU-aware Single-stage Object Detector for Accurate Localization (1912.05992v4)

Published 12 Dec 2019 in cs.CV

Abstract: Due to the simpleness and high efficiency, single-stage object detectors have been widely applied in many computer vision applications . However, the low correlation between the classification score and localization accuracy of the predicted detections has severely hurt the localization accuracy of models. In this paper, IoU-aware single-stage object detector is proposed to solve this problem. Specifically, IoU-aware single-stage object detector predicts the IoU for each detected box. Then the classification score and predicted IoU are multiplied to compute the final detection confidence, which is more correlated with the localization accuracy. The detection confidence is then used as the input of the subsequent NMS and COCO AP computation, which will substantially improve the localization accuracy of models. Sufficient experiments on COCO and PASCAL VOC datasets demonstrate the effectiveness of IoU-aware single-stage object detector on improving model's localization accuracy. Without whistles and bells, the proposed method can substantially improve AP by $1.7\%\sim1.9\%$ and AP75 by $2.2\%\sim2.5\%$ on COCO \textit{test-dev}. On PASCAL VOC, the proposed method can substantially improve AP by $2.9\%\sim4.4\%$ and AP80, AP90 by $4.6\%\sim10.2\%$. Code is available here: {https://github.com/ShengkaiWu/IoU-aware-single-stage-object-detector}.

Insights into IoU-aware Single-stage Object Detection for Enhanced Localization

In the domain of computer vision, single-stage object detectors have received significant attention due to their simplicity and efficiency. However, a critical issue persists: the lack of correlation between classification scores and localization accuracy, which adversely impacts the average precision (AP) of such models. The paper by Wu et al. addresses this challenge by introducing an IoU-aware single-stage object detector that improves localization without compromising efficiency.

This approach specifically predicts the Intersection over Union (IoU) for each detected bounding box. By multiplying the predicted IoU with the classification score, the model computes a more reliable detection confidence that is highly correlated with localization accuracy. This confidence score is pivotal for non-maximum suppression (NMS) and AP computation, subsequently enhancing the detection precision significantly.

Key Contributions

The primary contribution of this work is the introduction of an IoU-aware single-stage object detector that effectively mitigates the mismatch problem between classification confidence and localization accuracy. The method is implemented with minimal modifications to the standard RetinaNet architecture, involving the addition of a lightweight IoU prediction head parallel to the regression head. The paper highlights several conditions that inform the design and training of this IoU-aware detector:

  • IoU Prediction Head: The IoU prediction layer, a simple 3x3 convolution followed by a sigmoid activation, predicts the IoU of each bounding box. Training involves binary cross-entropy (BCE) loss for the IoU prediction, which adjusts the regression head during training to maximize alignment with ground truth localization.
  • Detection Confidence Calculation: A novel mechanism is proposed where the final detection confidence is computed using both classification scores and predicted IoUs, with an adjustable parameter, α, to balance their contributions.

Experimental Evaluation

The effectiveness of the proposed detector was validated through extensive experiments on the COCO and PASCAL VOC datasets. Notable improvements were observed:

  • An increase in AP by 1.7% to 1.9% on COCO test-dev, and substantial improvements in AP at threshold 75 (AP75) by 2.2% to 2.5%.
  • Performance gains in PASCAL VOC were even more pronounced, with AP enhancements ranging from 2.9% to 4.4% and AP at higher IoU thresholds such as AP80 and AP90 increased by up to 10.2%.

These results underscore an essential finding: the correlation between predicted confidence and localization accuracy is improved, particularly for higher IoU thresholds, indicating more accurate localization.

Implications and Future Directions

The implications of this work are multi-faceted. Firstly, it bridges the performance gap between single-stage and multi-stage detectors by enhancing precision without the overhead characteristic of multi-stage networks. Secondly, this approach can be generalized and integrated into other architectures requiring accurate localization, such as anchor-free detectors. The paper also opens up new avenues for exploration: improving IoU prediction through advanced feature alignment methods or attention mechanisms could further refine localization accuracy and enhance detection outcomes.

In summary, the IoU-aware detection model introduces a valuable refinement for single-stage detectors, striking an effective balance between computational efficiency and precision. Future endeavors could leverage this methodology to explore its integration with various neural network architectures or extend its applicability to real-time object detection tasks in dynamic environments.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Shengkai Wu (7 papers)
  2. Xiaoping Li (23 papers)
  3. Xinggang Wang (163 papers)
Citations (165)