Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

S$^2$-MLPv2: Improved Spatial-Shift MLP Architecture for Vision (2108.01072v1)

Published 2 Aug 2021 in cs.CV

Abstract: Recently, MLP-based vision backbones emerge. MLP-based vision architectures with less inductive bias achieve competitive performance in image recognition compared with CNNs and vision Transformers. Among them, spatial-shift MLP (S$2$-MLP), adopting the straightforward spatial-shift operation, achieves better performance than the pioneering works including MLP-mixer and ResMLP. More recently, using smaller patches with a pyramid structure, Vision Permutator (ViP) and Global Filter Network (GFNet) achieve better performance than S$2$-MLP. In this paper, we improve the S$2$-MLP vision backbone. We expand the feature map along the channel dimension and split the expanded feature map into several parts. We conduct different spatial-shift operations on split parts. Meanwhile, we exploit the split-attention operation to fuse these split parts. Moreover, like the counterparts, we adopt smaller-scale patches and use a pyramid structure for boosting the image recognition accuracy. We term the improved spatial-shift MLP vision backbone as S$2$-MLPv2. Using 55M parameters, our medium-scale model, S$2$-MLPv2-Medium achieves an $83.6\%$ top-1 accuracy on the ImageNet-1K benchmark using $224\times 224$ images without self-attention and external training data.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Tan Yu (17 papers)
  2. Xu Li (126 papers)
  3. Yunfeng Cai (27 papers)
  4. Mingming Sun (28 papers)
  5. Ping Li (421 papers)
Citations (48)

Summary

We haven't generated a summary for this paper yet.