Gaussian Combined Distance: A Generic Metric for Object Detection (2510.27649v1)

Published 31 Oct 2025 in cs.CV

Abstract: In object detection, a well-defined similarity metric can significantly enhance model performance. Currently, the IoU-based similarity metric is the most commonly preferred choice for detectors. However, detectors using IoU as a similarity metric often perform poorly when detecting small objects because of their sensitivity to minor positional deviations. To address this issue, recent studies have proposed the Wasserstein Distance as an alternative to IoU for measuring the similarity of Gaussian-distributed bounding boxes. However, we have observed that the Wasserstein Distance lacks scale invariance, which negatively impacts the model's generalization capability. Additionally, when used as a loss function, its independent optimization of the center attributes leads to slow model convergence and unsatisfactory detection precision. To address these challenges, we introduce the Gaussian Combined Distance (GCD). Through analytical examination of GCD and its gradient, we demonstrate that GCD not only possesses scale invariance but also facilitates joint optimization, which enhances model localization performance. Extensive experiments on the AI-TOD-v2 dataset for tiny object detection show that GCD, as a bounding box regression loss function and label assignment metric, achieves state-of-the-art performance across various detectors. We further validated the generalizability of GCD on the MS-COCO-2017 and Visdrone-2019 datasets, where it outperforms the Wasserstein Distance across diverse scales of datasets. Code is available at https://github.com/MArKkwanGuan/mmdet-GCD.

Summary

The paper introduces GCD, a metric designed to improve small object detection by combining Gaussian modeling with scale invariance and joint center-dimension optimization.
It details a nonlinear normalization of GCD to overcome sensitivity issues, achieving state-of-the-art performance on datasets like AI-TOD-v2, VisDrone-2019, and MS-COCO.
Ablation studies reveal that integrating GCD in detection frameworks significantly boosts AP metrics for extremely tiny objects, outperforming traditional IoU and WD metrics.

Gaussian Combined Distance: A Generic Metric for Object Detection

This essay examines the use of Gaussian Combined Distance (GCD) as a metric for object detection, specifically focusing on its advantages in detecting small objects. It will cover the motivation behind proposing GCD, its mathematical formulation and properties, its implementation in object detection frameworks, and its effectiveness across various datasets.

Introduction

Object detection in imagery, particularly with small objects, presents unique challenges not adequately addressed by traditional techniques. The Intersection over Union (IoU) metric, though standard, falls short for small objects due to its sensitivity to minor positional deviations and vanishing gradient issues, prompting the exploration of alternative metrics like the Wasserstein Distance (WD). However, WD suffers from limitations related to scale invariance and independent optimization of center attributes. The introduction of the Gaussian Combined Distance (GCD) addresses these limitations, providing a path towards improved object detection by incorporating scale invariance and joint optimization properties.

Gaussian Distribution Modeling

In object detection, representing a bounding box via Gaussian distribution captures the importance of central pixels effectively. For small objects, this approach assigns higher importance to central pixels, reducing the impact of background noise. The GCD is derived from this Gaussian modeling to improve bounding box regression.

Methodology

Gaussian Combined Distance

The GCD metric is designed to satisfy key criteria: affine invariance, symmetry, differentiability, and smooth boundary handling. The formulation of GCD is expressed as:

$\mathbf{D}_{gc}^2\left(\mathcal{N}_p, \mathcal{N}_t\right) = \frac{1}{2}\left( \frac{(x_{p}-x_{t})^2}{w_{p}^2} + \frac{(y_{p}-y_{t})^2}{h_{p}^2} \right) + \frac{1}{2}\left( \frac{(w_{p}-w_{t})^2}{4w_{p}^2} + \frac{(h_{p}-h_{t})^2}{4h_{p}^2} \right)$

The GCD proves to be symmetric and scale-invariant, a critical attribute absent in prior metrics like WD. It optimizes both the center and dimension gradients dynamically, enhancing detection performance, particularly with small objects.

Metric Normalization

The GCD exceeds the [0,1] range, making it excessively sensitive to errors. We mitigate this through a nonlinear transformation:

$\mathbf{M}_{gcd} = \exp\left(-\sqrt{\mathbf{D}_{gc}^2\left(\mathcal{N}_p, \mathcal{N}_t\right)}\right)$

Results

Extensive experiments on various datasets, including AI-TOD-v2, VisDrone-2019, and MS-COCO-2017, validate GCD's superior performance. It achieves state-of-the-art results on AI-TOD-v2 and robust outcomes on others, underscoring its generalizability and efficacy across different scales.

Figure 1: Visualization results on AI-TOD-v2 with RetinaNet. From left to right, they are GCD, NWD, WD, and GIoU. Green boxes represent GT, and red boxes represent predicted boxes. Clearly, GCD shows the best detection performance.

Ablation Studies

Ablation studies using RetinaNet and Faster R-CNN on AI-TOD-v2 emphasize the advantages of integrating GCD in both label assignment and regression losses, outmatching WD and NWD metrics. This reflects in notable improvements in AP metrics, particularly AP $_{vt}$ and AP $_{t}$ , which assess detection in extremely tiny and tiny object categories, respectively.

Discussions

The GCD's capacity to integrate optimization overcomes the limitations inherent in WD, specifically its independent center optimization and lack of scale invariance. Such improvements facilitate the detection of small objects, meeting challenges in surveillance and aerial imaging that demand precision.

Expectation and Future Work

Given GCD's robustness in horizontal detection tasks, its application in rotational target detection may reveal additional advantages. Future work might explore minor configuration adjustments to leverage GCD's properties fully in diverse detection environments.

Conclusion

The Gaussian Combined Distance emerges as a superior metric for object detection tasks, particularly small object detection. Its scale invariance and joint optimization properties achieve substantial improvements over IoU and WD-based approaches. Its applicability across different datasets ensures its utility in various real-world scenarios. The availability of the implementation code further facilitates adoption and experimentation by researchers and practitioners.