Oriented Object Detection with Transformer (2106.03146v1)

Published 6 Jun 2021 in cs.CV

Abstract: Object detection with Transformers (DETR) has achieved a competitive performance over traditional detectors, such as Faster R-CNN. However, the potential of DETR remains largely unexplored for the more challenging task of arbitrary-oriented object detection problem. We provide the first attempt and implement Oriented Object DEtection with TRansformer ($\bf O^2DETR$) based on an end-to-end network. The contributions of $\rm O^2DETR$ include: 1) we provide a new insight into oriented object detection, by applying Transformer to directly and efficiently localize objects without a tedious process of rotated anchors as in conventional detectors; 2) we design a simple but highly efficient encoder for Transformer by replacing the attention mechanism with depthwise separable convolution, which can significantly reduce the memory and computational cost of using multi-scale features in the original Transformer; 3) our $\rm O^2DETR$ can be another new benchmark in the field of oriented object detection, which achieves up to 3.85 mAP improvement over Faster R-CNN and RetinaNet. We simply fine-tune the head mounted on $\rm O^2DETR$ in a cascaded architecture and achieve a competitive performance over SOTA in the DOTA dataset.

Authors (9)

Teli Ma (22 papers)
Mingyuan Mao (6 papers)
Honghui Zheng (2 papers)
Peng Gao (402 papers)
Xiaodi Wang (15 papers)
Shumin Han (18 papers)
Errui Ding (156 papers)
Baochang Zhang (113 papers)
David Doermann (54 papers)

Citations (38)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Oriented Object Detection with Transformer (2106.03146v1)

Summary

Related Papers