LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection (2406.03459v1)

Published 5 Jun 2024 in cs.CV

Abstract: In this paper, we present a light-weight detection transformer, LW-DETR, which outperforms YOLOs for real-time object detection. The architecture is a simple stack of a ViT encoder, a projector, and a shallow DETR decoder. Our approach leverages recent advanced techniques, such as training-effective techniques, e.g., improved loss and pretraining, and interleaved window and global attentions for reducing the ViT encoder complexity. We improve the ViT encoder by aggregating multi-level feature maps, and the intermediate and final feature maps in the ViT encoder, forming richer feature maps, and introduce window-major feature map organization for improving the efficiency of interleaved attention computation. Experimental results demonstrate that the proposed approach is superior over existing real-time detectors, e.g., YOLO and its variants, on COCO and other benchmark datasets. Code and models are available at (https://github.com/Atten4Vis/LW-DETR).

Authors (15)

Qiang Chen (98 papers)
Xiangbo Su (5 papers)
Xinyu Zhang (296 papers)
Jian Wang (967 papers)
Jiahui Chen (72 papers)
Yunpeng Shen (1 paper)
Chuchu Han (13 papers)
Ziliang Chen (19 papers)
Weixiang Xu (9 papers)
Fanrong Li (7 papers)
Shan Zhang (84 papers)
Kun Yao (32 papers)
Errui Ding (156 papers)
Gang Zhang (139 papers)
Jingdong Wang (236 papers)

Citations (5)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/johnnync13/status/1800621270637195322

https://twitter.com/gm8xx8/status/1798618057939820837

LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection (2406.03459v1)

Summary

Related Papers

Tweets