Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection (2406.03459v1)

Published 5 Jun 2024 in cs.CV

Abstract: In this paper, we present a light-weight detection transformer, LW-DETR, which outperforms YOLOs for real-time object detection. The architecture is a simple stack of a ViT encoder, a projector, and a shallow DETR decoder. Our approach leverages recent advanced techniques, such as training-effective techniques, e.g., improved loss and pretraining, and interleaved window and global attentions for reducing the ViT encoder complexity. We improve the ViT encoder by aggregating multi-level feature maps, and the intermediate and final feature maps in the ViT encoder, forming richer feature maps, and introduce window-major feature map organization for improving the efficiency of interleaved attention computation. Experimental results demonstrate that the proposed approach is superior over existing real-time detectors, e.g., YOLO and its variants, on COCO and other benchmark datasets. Code and models are available at (https://github.com/Atten4Vis/LW-DETR).

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (15)
  1. Qiang Chen (98 papers)
  2. Xiangbo Su (5 papers)
  3. Xinyu Zhang (296 papers)
  4. Jian Wang (967 papers)
  5. Jiahui Chen (72 papers)
  6. Yunpeng Shen (1 paper)
  7. Chuchu Han (13 papers)
  8. Ziliang Chen (19 papers)
  9. Weixiang Xu (9 papers)
  10. Fanrong Li (7 papers)
  11. Shan Zhang (84 papers)
  12. Kun Yao (32 papers)
  13. Errui Ding (156 papers)
  14. Gang Zhang (139 papers)
  15. Jingdong Wang (236 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.