Oriented Object Detection with Transformer (2106.03146v1)
Abstract: Object detection with Transformers (DETR) has achieved a competitive performance over traditional detectors, such as Faster R-CNN. However, the potential of DETR remains largely unexplored for the more challenging task of arbitrary-oriented object detection problem. We provide the first attempt and implement Oriented Object DEtection with TRansformer ($\bf O2DETR$) based on an end-to-end network. The contributions of $\rm O2DETR$ include: 1) we provide a new insight into oriented object detection, by applying Transformer to directly and efficiently localize objects without a tedious process of rotated anchors as in conventional detectors; 2) we design a simple but highly efficient encoder for Transformer by replacing the attention mechanism with depthwise separable convolution, which can significantly reduce the memory and computational cost of using multi-scale features in the original Transformer; 3) our $\rm O2DETR$ can be another new benchmark in the field of oriented object detection, which achieves up to 3.85 mAP improvement over Faster R-CNN and RetinaNet. We simply fine-tune the head mounted on $\rm O2DETR$ in a cascaded architecture and achieve a competitive performance over SOTA in the DOTA dataset.
- Teli Ma (22 papers)
- Mingyuan Mao (6 papers)
- Honghui Zheng (2 papers)
- Peng Gao (402 papers)
- Xiaodi Wang (15 papers)
- Shumin Han (18 papers)
- Errui Ding (156 papers)
- Baochang Zhang (113 papers)
- David Doermann (54 papers)