Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression (1911.08287v1)

Published 19 Nov 2019 in cs.CV

Abstract: Bounding box regression is the crucial step in object detection. In existing methods, while $\ell_n$-norm loss is widely adopted for bounding box regression, it is not tailored to the evaluation metric, i.e., Intersection over Union (IoU). Recently, IoU loss and generalized IoU (GIoU) loss have been proposed to benefit the IoU metric, but still suffer from the problems of slow convergence and inaccurate regression. In this paper, we propose a Distance-IoU (DIoU) loss by incorporating the normalized distance between the predicted box and the target box, which converges much faster in training than IoU and GIoU losses. Furthermore, this paper summarizes three geometric factors in bounding box regression, \ie, overlap area, central point distance and aspect ratio, based on which a Complete IoU (CIoU) loss is proposed, thereby leading to faster convergence and better performance. By incorporating DIoU and CIoU losses into state-of-the-art object detection algorithms, e.g., YOLO v3, SSD and Faster RCNN, we achieve notable performance gains in terms of not only IoU metric but also GIoU metric. Moreover, DIoU can be easily adopted into non-maximum suppression (NMS) to act as the criterion, further boosting performance improvement. The source code and trained models are available at https://github.com/Zzh-tju/DIoU.

Citations (3,082)

View on Semantic Scholar

Summary

The paper introduces DIoU and CIoU losses that integrate central point distance and aspect ratio, significantly enhancing bounding box regression.
DIoU loss penalizes the normalized distance between predicted and target centers, speeding up convergence compared to conventional IoU losses.
Empirical tests on PASCAL VOC and MS COCO show that DIoU and CIoU yield higher accuracy and faster detection in state-of-the-art models.

An Analytical Overview of "Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression"

This paper addresses significant challenges in bounding box regression, a pivotal step in object detection frameworks. The authors introduce two novel loss functions—Distance-IoU (DIoU) loss and Complete-IoU (CIoU) loss—designed to improve both the speed of convergence and the accuracy of bounding box regression.

Core Findings and Methods

Bounding box regression conventions primarily rely on $\ell_n$ -norm loss, which is sub-optimal for the IoU metric. More recent approaches have employed the IoU loss and its generalized variant, GIoU loss. However, both methods exhibit limitations in terms of convergence rate and efficiency, especially for non-overlapping boxes.

In contrast, DIoU loss introduces a penalty based on the normalized distance between the central points of the predicted and target bounding boxes. This metric significantly accelerates convergence by minimizing the centroid distance directly, rather than increasing the overlap as done by IoU and GIoU losses.

Furthermore, the CIoU loss builds upon DIoU by integrating three vital geometric factors: overlap area, central point distance, and aspect ratio. This holistic approach improves the bounding box prediction's accuracy and convergence speed, compared to IoU and GIoU.

Numerical Results

The practical implications and advantages of the proposed methods were tested on two popular object detection benchmarks: PASCAL VOC and MS COCO. The authors incorporated DIoU and CIoU into state-of-the-art detectors, including YOLO v3, SSD, and Faster R-CNN.

For YOLO v3 on PASCAL VOC, DIoU and CIoU outperformed baseline IoU and GIoU by significant margins. Incorporating CIoU and DIoU-NMS (Non-Maximum Suppression) provided additional performance enhancements, underscoring the methods' robustness and scalability.

SSD experiments echoed these findings, though the performance gains for DIoU and CIoU over IoU and GIoU were more modest. Nonetheless, the results affirm the efficacy of both DIoU and CIoU in accelerating convergence and improving bounding box regression accuracy.

Faster R-CNN evaluations on MS COCO further validated these improvements. Apart from achieving higher AP and AP75 scores, DIoU and CIoU provided better results for medium and large objects, demonstrating their versatility across diverse object scales.

Practical and Theoretical Implications

The methodologies proposed offer several practical advantages. DIoU loss can be easily integrated into existing object detection pipelines without extensive recalibration. Similarly, CIoU loss provides a compound approach that synthesizes multiple geometric factors, enhancing prediction accuracy for complex scenes.

From a theoretical perspective, this paper introduces a refined objective for bounding box regression. By incorporating distance minimization and aspect ratio consistency, these losses challenge the status quo established by IoU and GIoU, offering a foundation for further advancements in object detection algorithms.

Future Developments

Potential future research might explore extending DIoU and CIoU losses to multi-object detection tasks and experimenting with their integration into emerging deep learning frameworks beyond computer vision, such as natural language processing models that rely on bounding box-like data structures.

In summary, this paper presents a well-argued and empirically validated advancement in bounding box regression techniques. The proposed DIoU and CIoU losses facilitate faster convergence and more accurate object detection, contributing meaningfully to the field of computer vision.

PDF Markdown

Related Papers

YouTube

Show All Videos