Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks (2303.03667v3)

Published 7 Mar 2023 in cs.CV

Abstract: To design fast neural networks, many works have been focusing on reducing the number of floating-point operations (FLOPs). We observe that such reduction in FLOPs, however, does not necessarily lead to a similar level of reduction in latency. This mainly stems from inefficiently low floating-point operations per second (FLOPS). To achieve faster networks, we revisit popular operators and demonstrate that such low FLOPS is mainly due to frequent memory access of the operators, especially the depthwise convolution. We hence propose a novel partial convolution (PConv) that extracts spatial features more efficiently, by cutting down redundant computation and memory access simultaneously. Building upon our PConv, we further propose FasterNet, a new family of neural networks, which attains substantially higher running speed than others on a wide range of devices, without compromising on accuracy for various vision tasks. For example, on ImageNet-1k, our tiny FasterNet-T0 is $2.8\times$, $3.3\times$, and $2.4\times$ faster than MobileViT-XXS on GPU, CPU, and ARM processors, respectively, while being $2.9\%$ more accurate. Our large FasterNet-L achieves impressive $83.5\%$ top-1 accuracy, on par with the emerging Swin-B, while having $36\%$ higher inference throughput on GPU, as well as saving $37\%$ compute time on CPU. Code is available at \url{https://github.com/JierunChen/FasterNet}.

Citations (480)

Summary

  • The paper introduces Partial Convolution (PConv) to minimize redundant computations and maximize FLOPS for faster neural network performance.
  • FasterNet achieves significant speed improvements, being up to 3.3× faster on CPUs and showing higher accuracy compared to MobileViT-XXS.
  • The study emphasizes optimizing computational efficiency over merely reducing FLOPs, paving the way for more effective real-time and edge computing solutions.

Analyzing "Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks"

This paper, authored by Jierun Chen et al., presents a critical examination of the commonly pursued strategy in neural network design aimed at reducing floating-point operations (FLOPs) to achieve faster networks. The authors identify that a reduction in FLOPs does not inherently translate to decreased latency due to inefficiencies in floating-point operations per second (FLOPS). Their work proposes a novel operator and a network architecture that addresses these inefficiencies.

Key Contributions

The authors introduce Partial Convolution (PConv), designed to extract spatial features by minimizing redundant computations and memory accesses. Unlike depthwise and group convolutions which increase memory access, PConv applies convolution selectively on input channels, maintaining higher FLOPS and thus achieving lower latency.

PConv forms the basis of FasterNet, a new neural network family that demonstrates significantly increased execution speed across various hardware platforms without sacrificing accuracy. FasterNet's architecture is optimized for latency and throughput, focusing on a streamlined design including batch normalization and specific activation schemes to enhance computational efficiency.

Numerical Results

Performance evaluations highlight considerable gains in both speed and accuracy:

  • The tiny variant of FasterNet (FasterNet-T0) is reported to be 2.8 times faster on GPU, 3.3 times faster on CPU, and 2.4 times faster on ARM processors compared to MobileViT-XXS, while also being 2.9% more accurate.
  • The larger variant, FasterNet-L, achieves a top-1 accuracy of 83.5% on ImageNet-1k, competitive with models like Swin-B, but with a 36% increase in throughput on GPU and 37% decrease in compute time on CPU.

Implications and Future Directions

The authors argue for a shift in focus from reducing FLOPs to optimizing FLOPS, which involves refining computational operations to maximize hardware capabilities. This perspective offers both theoretical and practical implications for future neural network design, potentially leading to more efficient architectures that better utilize hardware resources.

FasterNet, with its PConv-based architecture, opens avenues for further exploration into hybrid architectures. Future work might explore combining PConv with other techniques, such as attention mechanisms, to potentially enhance accuracy while maintaining efficiency.

The proposed approach presents a significant step in designing neural networks that are not only compact but also attain high execution speeds on different hardware. The exploration of computational efficiencies at the operator level marks a promising pathway for future developments in network architectures, especially for real-time applications and edge computing.

Conclusion

This paper's contributions lie in addressing a largely overlooked aspect of neural network efficiency—maximizing FLOPS rather than merely reducing FLOPs. By introducing PConv and FasterNet, the authors provide a foundation for new research directions centered around optimizing the interaction between neural networks and computational hardware. The results underscore the potential of such approaches in achieving substantial improvements in both speed and accuracy across diverse devices.

Github Logo Streamline Icon: https://streamlinehq.com