OTA: Optimal Transport Assignment for Object Detection (2103.14259v1)

Published 26 Mar 2021 in cs.CV

Abstract: Recent advances in label assignment in object detection mainly seek to independently define positive/negative training samples for each ground-truth (gt) object. In this paper, we innovatively revisit the label assignment from a global perspective and propose to formulate the assigning procedure as an Optimal Transport (OT) problem -- a well-studied topic in Optimization Theory. Concretely, we define the unit transportation cost between each demander (anchor) and supplier (gt) pair as the weighted summation of their classification and regression losses. After formulation, finding the best assignment solution is converted to solve the optimal transport plan at minimal transportation costs, which can be solved via Sinkhorn-Knopp Iteration. On COCO, a single FCOS-ResNet-50 detector equipped with Optimal Transport Assignment (OTA) can reach 40.7% mAP under 1X scheduler, outperforming all other existing assigning methods. Extensive experiments conducted on COCO and CrowdHuman further validate the effectiveness of our proposed OTA, especially its superiority in crowd scenarios. The code is available at https://github.com/Megvii-BaseDetection/OTA.

Citations (382)

View on Semantic Scholar

Summary

The paper introduces OTA, a novel label assignment strategy that formulates anchor matching as an optimal transport problem to improve object detection.
It leverages the Sinkhorn-Knopp iteration alongside Center Prior and Dynamic k Estimation, resulting in significant AP gains from 38.3% to 40.7%.
Experimental results on MS COCO and CrowdHuman datasets demonstrate OTA's robustness and superiority over methods like FCOS, ATSS, and PAA.

Optimal Transport Assignment for Object Detection

The paper "OTA: Optimal Transport Assignment for Object Detection" presents an innovative label assignment strategy for object detectors. The approach, named Optimal Transport Assignment (OTA), reconfigures the label assignment procedure into an Optimal Transport (OT) problem. The authors argue that OTA leverages global optimization principles, providing significant improvements in handling ambiguous anchors and overall detector performance.

Introduction and Motivation

Convolutional Neural Network (CNN)-based object detectors typically operate by predicting classifications and regressions for predefined anchors. This process, known as label assignment, involves matching ground-truth (gt) objects to anchors. Traditional label assignment strategies rely on predefined rules, such as Intersection-over-Union (IoU) thresholds, to determine the positive and negative anchors. However, these strategies often fall short as they do not account for variability in object size, shape, or occlusion.

Dynamic assignment strategies have emerged to address these issues, proposing criteria that adapt to the confidence scores predicted for each anchor. Despite achieving state-of-the-art performance, these strategies still face limitations, particularly in situations involving ambiguous anchors—anchors that could potentially correspond to multiple gts. The paper argues that independent assignment for each gt may lead to suboptimal results and proposes a global optimization approach.

OTA: Optimal Transport Assignment

OTA formulates the label assignment problem as an OT problem, a type of linear programming. In this framework, each gt is treated as a supplier of a certain number of positive labels, while each anchor acts as a demander. The transportation cost between a gt-anchor pair is defined by their classification and regression losses. Background is introduced as a supplier of negative labels, with the transportation cost being the classification loss.

The goal is to find the optimal transportation plan, minimizing the overall cost. This task is efficiently solved using the Sinkhorn-Knopp Iteration, which handles large-scale linear programs through iterative matrix multiplications. The result is a globally optimized label assignment strategy that can dynamically manage ambiguous anchors.

Advanced Designs

Several advanced designs are incorporated to enhance OTA's performance:

Center Prior: A heuristic is added to focus the training process on potential positive regions near object centers, beneficial particularly in initial training stages.
Dynamic $k$ Estimation: Determines the number of positive labels each gt supplies based on IoU values between predicted bounding boxes and gts, providing a data-driven way to estimate the demand for positive anchors.

Experimental Validation

The authors conducted comprehensive experiments on the MS COCO benchmark, demonstrating significant performance improvements. OTA also achieved state-of-the-art performance among one-stage detectors on the crowded-CrowdHuman dataset, highlighting its robustness across different datasets.

Effectiveness of Components: Ablation studies confirm the individual contributions of Center Prior, Dynamic $k$ Estimation, and the overall OTA framework. For instance, OTA improved average precision (AP) from 38.3% to 40.7% when compared to baseline methods like FCOS and ATSS.

Handling Ambiguous Anchors: OTA excels at handling ambiguous anchors, where existing methods falter. The inherent global optimization methodology, which resolves conflicts based on minimal global costs, ensures that ambiguous areas are less problematic.

Impact of $k$ and Center Prior Radius: The authors evaluated various $k$ values and Center Prior radius settings, illustrating that OTA remains effective under varying conditions and outperforms other contemporary approaches like PAA and ATSS consistently.

Comparisons and Implications

The results indicate that OTA not only achieves higher numerical performance metrics but also offers theoretical advancements in label assignment strategies. The ability of OTA to handle ambiguous anchors effectively makes it particularly suited for crowded detection scenarios, where traditional methods may struggle.

Conclusion and Future Directions

OTA represents a significant contribution to the field of object detection by introducing a globally optimized, dynamic label assignment strategy. The framework's strong performance on both standard and crowded datasets underscores its versatility and effectiveness.

Future research may explore further optimizations in transportation cost formulations, extending OTA to other domains requiring dynamic allocation under constraints. Additionally, integrating more sophisticated cost functions might refine the assigning process even further, potentially unlocking new performance benchmarks in object detection.

In essence, OTA provides a robust, theoretically sound approach to label assignment, presenting both practical benefits in object detection tasks and opening avenues for novel dynamic assignment strategies in broader applications.

PDF Markdown

Related Papers

GitHub

GitHub - Megvii-BaseDetection/OTA: Official implementation of our CVPR2021 paper "OTA: Optimal Transport Assignment for Object Detection" in Pytorch. (243 stars)