Acquisition of Localization Confidence for Accurate Object Detection (1807.11590v1)

Published 30 Jul 2018 in cs.CV

Abstract: Modern CNN-based object detectors rely on bounding box regression and non-maximum suppression to localize objects. While the probabilities for class labels naturally reflect classification confidence, localization confidence is absent. This makes properly localized bounding boxes degenerate during iterative regression or even suppressed during NMS. In the paper we propose IoU-Net learning to predict the IoU between each detected bounding box and the matched ground-truth. The network acquires this confidence of localization, which improves the NMS procedure by preserving accurately localized bounding boxes. Furthermore, an optimization-based bounding box refinement method is proposed, where the predicted IoU is formulated as the objective. Extensive experiments on the MS-COCO dataset show the effectiveness of IoU-Net, as well as its compatibility with and adaptivity to several state-of-the-art object detectors.

Citations (827)

View on Semantic Scholar

Summary

The paper introduces IoU-Net, which predicts Intersection over Union (IoU) as localization confidence for more accurate detection.
It employs IoU-guided NMS and optimization-based refinement to prevent mis-suppression and ensure monotonic localization improvements.
Experiments on MS-COCO with leading detectors show up to a 2.1% AP boost, validating the method’s effectiveness.

Acquisition of Localization Confidence for Accurate Object Detection

In this paper titled "Acquisition of Localization Confidence for Accurate Object Detection," the authors present a novel approach to enhancing the accuracy of object detection by introducing the concept of localization confidence. Specifically, they propose a network, IoU-Net, that predicts the Intersection over Union (IoU) between each detected bounding box and the matched ground-truth, providing a measure of localization accuracy that is missing in traditional object detectors.

Problem Statement and Motivations

The standard practice in modern Convolutional Neural Network (CNN)-based object detectors relies on bounding box regression and Non-Maximum Suppression (NMS) to localize objects. While classification confidence is naturally derived from class probabilities, localization confidence is notably absent. This discrepancy can lead to two significant issues:

Accurately localized bounding boxes may be suppressed during the NMS phase due to lower classification confidence scores, causing improper localization.
Iterative bounding box regression may sometimes degrade bounding box localization, exhibiting non-monotonic behavior.

Proposed Method: IoU-Net

The authors introduce IoU-Net to address these issues. IoU-Net is designed to predict the IoU between detected bounding boxes and the ground-truth, serving as a localization confidence measure. This contribution can be utilized in two primary ways:

IoU-Guided NMS: Traditional NMS uses classification confidence for ranking, which can result in misaligned suppression of bounding boxes. IoU-guided NMS, however, leverages the predicted IoU for ranking, preventing accurately localized bounding boxes from being incorrectly suppressed.
Optimization-Based Bounding Box Refinement: Using the predicted IoU as an objective, an optimization-based bounding box refinement method is proposed. This method iteratively adjusts bounding box coordinates via gradient ascent to maximize the IoU, thereby providing monotonic improvement in localization accuracy.

Experimental Validation

The authors carried out extensive experiments on the MS-COCO dataset to validate the effectiveness of IoU-Net. The experiments covered various scenarios including the application of IoU-guided NMS and optimization-based bounding box refinement on several state-of-the-art object detectors, including FPN, Cascade R-CNN, and Mask R-CNN.

Key Results

IoU-guided NMS: Demonstrated significant improvement, especially in AP metrics evaluated at higher IoU thresholds (e.g., AP_90).
Bounding Box Refinement: Facilitated better localization with monotonic improvements, surpassing traditional regression-based refinement and improving AP across multiple detectors.
Joint Training: Integration of IoU-Net into detection models further enhanced overall performance, showing an increase of up to 2.1% in AP for ResNet101-FPN compared to the baseline.

Implications

The introduction of localization confidence via IoU prediction represents a notable advancement in object detection. By disentangling classification and localization metrics, IoU-Net addresses critical flaws in existing pipelines, ensuring more accurate object localization. The practical applications of these improvements could extend to a variety of fields requiring precise object detection, such as autonomous driving, surveillance, and medical imaging.

Future Directions

Future research could explore several potential areas:

Improving IoU Prediction Accuracy: Augmenting the IoU estimator's robustness, especially for bounding boxes with lower localization accuracy.
Adaptive Strategies: Developing adaptive mechanisms for bounding box refinement that adjust the number of iterations based on real-time IoU predictions.
Integration with Advanced Architectures: Evaluating the performance of IoU-Net when integrated with novel detection architectures and other vision tasks.

In conclusion, IoU-Net provides an important contribution by introducing localization confidence to enhance the accuracy of object detection pipelines, addressing key issues associated with traditional methods, and offering avenues for further research and application advancements.

PDF Markdown