Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DMFormer: Closing the Gap Between CNN and Vision Transformers (2209.07738v3)

Published 16 Sep 2022 in cs.CV, cs.AI, and cs.LG

Abstract: Vision transformers have shown excellent performance in computer vision tasks. As the computation cost of their self-attention mechanism is expensive, recent works tried to replace the self-attention mechanism in vision transformers with convolutional operations, which is more efficient with built-in inductive bias. However, these efforts either ignore multi-level features or lack dynamic prosperity, leading to sub-optimal performance. In this paper, we propose a Dynamic Multi-level Attention mechanism (DMA), which captures different patterns of input images by multiple kernel sizes and enables input-adaptive weights with a gating mechanism. Based on DMA, we present an efficient backbone network named DMFormer. DMFormer adopts the overall architecture of vision transformers, while replacing the self-attention mechanism with our proposed DMA. Extensive experimental results on ImageNet-1K and ADE20K datasets demonstrated that DMFormer achieves state-of-the-art performance, which outperforms similar-sized vision transformers(ViTs) and convolutional neural networks (CNNs).

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Zimian Wei (11 papers)
  2. Hengyue Pan (19 papers)
  3. Lujun Li (30 papers)
  4. Menglong Lu (5 papers)
  5. Xin Niu (14 papers)
  6. Peijie Dong (26 papers)
  7. Dongsheng Li (240 papers)
Citations (5)