Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RepNeXt: A Fast Multi-Scale CNN using Structural Reparameterization (2406.16004v2)

Published 23 Jun 2024 in cs.CV

Abstract: In the realm of resource-constrained mobile vision tasks, the pursuit of efficiency and performance consistently drives innovation in lightweight Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). While ViTs excel at capturing global context through self-attention mechanisms, their deployment in resource-limited environments is hindered by computational complexity and latency. Conversely, lightweight CNNs are favored for their parameter efficiency and low latency. This study investigates the complementary advantages of CNNs and ViTs to develop a versatile vision backbone tailored for resource-constrained applications. We introduce RepNeXt, a novel model series integrates multi-scale feature representations and incorporates both serial and parallel structural reparameterization (SRP) to enhance network depth and width without compromising inference speed. Extensive experiments demonstrate RepNeXt's superiority over current leading lightweight CNNs and ViTs, providing advantageous latency across various vision benchmarks. RepNeXt-M4 matches RepViT-M1.5's 82.3\% accuracy on ImageNet within 1.5ms on an iPhone 12, outperforms its AP${box}$ by 1.3 on MS-COCO, and reduces parameters by 0.7M. Codes and models are available at https://github.com/suous/RepNeXt.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Mingshu Zhao (9 papers)
  2. Yi Luo (153 papers)
  3. Yong Ouyang (2 papers)
Citations (2)
Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com