Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CycleMLP: A MLP-like Architecture for Dense Prediction (2107.10224v4)

Published 21 Jul 2021 in cs.CV

Abstract: This paper presents a simple MLP-like architecture, CycleMLP, which is a versatile backbone for visual recognition and dense predictions. As compared to modern MLP architectures, e.g., MLP-Mixer, ResMLP, and gMLP, whose architectures are correlated to image size and thus are infeasible in object detection and segmentation, CycleMLP has two advantages compared to modern approaches. (1) It can cope with various image sizes. (2) It achieves linear computational complexity to image size by using local windows. In contrast, previous MLPs have $O(N2)$ computations due to fully spatial connections. We build a family of models which surpass existing MLPs and even state-of-the-art Transformer-based models, e.g., Swin Transformer, while using fewer parameters and FLOPs. We expand the MLP-like models' applicability, making them a versatile backbone for dense prediction tasks. CycleMLP achieves competitive results on object detection, instance segmentation, and semantic segmentation. In particular, CycleMLP-Tiny outperforms Swin-Tiny by 1.3% mIoU on ADE20K dataset with fewer FLOPs. Moreover, CycleMLP also shows excellent zero-shot robustness on ImageNet-C dataset. Code is available at https://github.com/ShoufaChen/CycleMLP.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Shoufa Chen (22 papers)
  2. Enze Xie (84 papers)
  3. Chongjian Ge (23 papers)
  4. Runjian Chen (20 papers)
  5. Ding Liang (39 papers)
  6. Ping Luo (340 papers)
Citations (216)

Summary

Analysis of CycleMLP: An MLP-like Architecture for Dense Prediction

The paper "CycleMLP: A MLP-like Architecture for Dense Prediction" presents an innovative architecture designed to enhance Multi-Layer Perceptron (MLP) application in dense prediction tasks, such as object detection and semantic segmentation. Historically, MLP architectures faced challenges with variable image sizes and computational inefficiencies in dense prediction contexts due to dependencies on fully spatial connections. This research introduces CycleMLP, which proposes a new operator called Cycle Fully-Connected Layer (Cycle FC) to address these limitations.

In comparison with previous MLP architectures like MLP-Mixer, ResMLP, and gMLP, CycleMLP offers two central improvements: it accommodates arbitrary image sizes and achieves linear computational complexity in relation to image size. The latter is accomplished through the employment of local windows, contrasting the quadratic computational demand typical in prior MLP models due to their fully-connected nature. The authors present a suite of CycleMLP model variants which not only surpass previous MLP-like models but also outperform contemporary Transformer-based models, such as the Swin Transformer, in dense prediction tasks.

The paper rigorously compares CycleMLP with existing architectures on several benchmarks. For instance, CycleMLP-Tiny achieves a 1.3% higher mean Intersection over Union (mIoU) on the ADE20K benchmark compared to Swin-Tiny while using fewer FLOPs, underscoring its computational efficiency and effectiveness in segmentation tasks. Notably, CycleMLP also demonstrates improved robustness in zero-shot scenarios on the ImageNet-C dataset, suggesting broader applicability in real-world contexts where robustness to various image perversions is critical.

Theoretical and Practical Implications:

  1. Theoretical Contributions:
    • The Cycle FC layer: This novel component introduces a structured sparsity into the spatial connections, which allows CycleMLP to better manage the trade-offs between computational efficiency and architectural flexibility. It replaces traditional spatial FCs with cyclical addressable points, which drastically reduces unnecessary computational overhead without sacrificing spatial richness.
  • Enhanced inductive biases: By re-imagining the spatial representation capabilities of MLPs through Cycle FC, the authors provide a new lens to explore inductive biases in artificial neural networks, challenging conventional configurations in vision models.
  1. Practical Contributions:
    • Versatility across tasks: The hierarchical structure of CycleMLP allows for feature pyramids crucial in dense prediction. This enables its use in tasks requiring high spatial resolution and multi-scale features — typical of state-of-the-art models in practical deployment.
  • Resolution adaptability: CycleMLP can seamlessly handle varying input resolutions without requiring hefty recalibrations or parameter interpolations typical in some transformer-based models. This translates into significant advantages for real-world applications with dynamic input scales.
  1. Speculations for Future Developments:
    • Cross-domain generalization: The intrinsic modifications introduced to FC layers could inspire additional adaptations across other domains where MLPs traditionally underperform. Particularly, domains involving significantly varied input scales or resolutions may find this model exceptionally beneficial.
  • Further optimization in parallelized computing: Cycle FC’s unique structure could lend itself well to hardware acceleration and parallel computing optimizations, enabling even faster processing times and energy efficiency improvements in larger-scale deployments.

In conclusion, CycleMLP represents a deliberate reconsideration of MLP viability in dense prediction tasks through innovative design. By integrating the Cycle FC, this approach significantly optimizes traditional MLP limitations, marking a valuable contribution to AI vision model paradigms while providing a foundation for future exploitation and innovation in AI workloads.