NMS Strikes Back (2212.06137v1)

Published 12 Dec 2022 in cs.CV

Abstract: Detection Transformer (DETR) directly transforms queries to unique objects by using one-to-one bipartite matching during training and enables end-to-end object detection. Recently, these models have surpassed traditional detectors on COCO with undeniable elegance. However, they differ from traditional detectors in multiple designs, including model architecture and training schedules, and thus the effectiveness of one-to-one matching is not fully understood. In this work, we conduct a strict comparison between the one-to-one Hungarian matching in DETRs and the one-to-many label assignments in traditional detectors with non-maximum supervision (NMS). Surprisingly, we observe one-to-many assignments with NMS consistently outperform standard one-to-one matching under the same setting, with a significant gain of up to 2.5 mAP. Our detector that trains Deformable-DETR with traditional IoU-based label assignment achieved 50.2 COCO mAP within 12 epochs (1x schedule) with ResNet50 backbone, outperforming all existing traditional or transformer-based detectors in this setting. On multiple datasets, schedules, and architectures, we consistently show bipartite matching is unnecessary for performant detection transformers. Furthermore, we attribute the success of detection transformers to their expressive transformer architecture. Code is available at https://github.com/jozhang97/DETA.

PDF Abstract

An Analytical Perspective on "NMS Strikes Back"

The paper "NMS Strikes Back" offers a detailed exploration of object detection methodologies, specifically comparing one-to-one Hungarian matching as employed by Detection Transformers (DETRs) with traditional one-to-many label assignments involving Non-Maximum Suppression (NMS). Notably, recent trends favor DETRs' elegant end-to-end design, which has surpassed conventional detectors in COCO benchmarks. Nevertheless, this paper questions the widely held assumption that one-to-one matching is essential for effective detection transformer performance, presenting a compelling case for one-to-many label assignments in conjunction with NMS.

Comparative Analysis of Matching and Assignment Strategies

The paper presents a meticulous examination of one-to-one bipartite Hungarian matching, contrasted against traditional IoU-based one-to-many assignments. Surprisingly, the authors demonstrate that reverting to an NMS-based assignment approach results in a consistent outperforming of the DETR's standard method, evidenced by a striking gain up to 2.5 mAP. Their redesigned transformer-based object detector, utilizing IoU-based assignments and trained within a 12-epoch schedule using a ResNet50 backbone, achieves a notable 50.2 COCO mAP—surpassing existing architectures in this training regime.

Implications and Observations

The research delineates several noteworthy contributions. Firstly, it challenges the perceived necessity of bipartite matching in detection transformers, attributing the success of DETRs to their sophisticated transformer architecture rather than their matching strategy. The effective utilization of IoU-based label assignment reintegrates NMS, marking a return to more traditional methodologies. This revelation underscores that the transformative benefit emerges primarily from the expressive power of the transformer models, disassociated from their matching paradigm.

Secondly, the approach enhances the speed of convergence, bolstered by the assignment of more positive predictions via traditional overlapping assignments, thereby enriching the training signal. This advantage is particularly highlighted in handling smaller objects, where it significantly bolsters performance compared to end-to-end matching techniques.

Broader Implications for AI Development

Practically, the implications of this work suggest a paradigm shift in how current AI models might approach object detection tasks, with a potential reduction in model training complexity and time without compromising on accuracy. Theoretically, it invites a reevaluation of the mechanisms perceived as vital for performance in transformer-based architectures, revealing opportunities to optimize existing methodologies through an overview of traditional and novel techniques.

Future Directions

Moving forward, the findings invite further inquiry into refining the balance between assignment complexity and model architecture sophistication. The adaptability of transformer-based systems to different heuristic approaches like NMS underscores a potential avenue for combining legacy techniques with transformative AI advances. Additionally, future developments could explore the synergistic potential of integrating various matching strategies to optimize across diverse datasets and application requirements, further diversifying the methodologies available for cutting-edge object detection tasks.

In summary, "NMS Strikes Back" adeptly challenges existing paradigms in detection transformers, revealing that simplicity in assignment strategies can coexist with, and indeed enhance, the expressive capacities of modern transformer architectures. This paper serves as a catalyst for ongoing exploration and innovation within the field of computer vision and AI.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Jeffrey Ouyang-Zhang (2 papers)
Jang Hyun Cho (9 papers)
Xingyi Zhou (26 papers)
Philipp Krähenbühl (55 papers)

Citations (32)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - jozhang97/DETA: Detection Transformers with Assignment (255 stars)