Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MobileOne: An Improved One millisecond Mobile Backbone (2206.04040v2)

Published 8 Jun 2022 in cs.CV

Abstract: Efficient neural network backbones for mobile devices are often optimized for metrics such as FLOPs or parameter count. However, these metrics may not correlate well with latency of the network when deployed on a mobile device. Therefore, we perform extensive analysis of different metrics by deploying several mobile-friendly networks on a mobile device. We identify and analyze architectural and optimization bottlenecks in recent efficient neural networks and provide ways to mitigate these bottlenecks. To this end, we design an efficient backbone MobileOne, with variants achieving an inference time under 1 ms on an iPhone12 with 75.9% top-1 accuracy on ImageNet. We show that MobileOne achieves state-of-the-art performance within the efficient architectures while being many times faster on mobile. Our best model obtains similar performance on ImageNet as MobileFormer while being 38x faster. Our model obtains 2.3% better top-1 accuracy on ImageNet than EfficientNet at similar latency. Furthermore, we show that our model generalizes to multiple tasks - image classification, object detection, and semantic segmentation with significant improvements in latency and accuracy as compared to existing efficient architectures when deployed on a mobile device. Code and models are available at https://github.com/apple/ml-mobileone

MobileOne: An Efficient Backbone for Mobile Inference

The research paper presents MobileOne, an advanced neural network backbone explicitly optimized for mobile devices, emphasizing latency rather than conventional metrics like FLOPs or parameter count. The authors critically evaluate mobile-efficient neural architectures, identifying architectural bottlenecks and proposing solutions to enhance inference efficiency on mobile platforms.

Key Contributions

  1. Architectural Innovation: MobileOne introduces an optimized structure leveraging train-time over-parameterization and re-parameterization techniques during inference. This approach results in a lean feed-forward network with improved accuracy and reduced memory access costs.
  2. Performance Metrics: Variants of MobileOne achieve inference times under 1 ms on a mobile device (iPhone12) while maintaining high top-1 accuracy on ImageNet. Remarkably, the MobileOne architecture surpasses other efficient models, demonstrating a 2.3% improvement in top-1 accuracy over EfficientNet at similar latency levels.
  3. Generality Across Tasks: The paper highlights MobileOne's efficacy across multiple computer vision tasks, including image classification, object detection, and semantic segmentation. The architecture significantly reduces latency and enhances accuracy for these applications when deployed on mobile devices.
  4. Comprehensive Benchmarking: The authors validate their claims by benchmarking MobileOne against contemporary models using various platforms, including mobile (iPhone 12), desktop CPU, and GPU. MobileOne exhibits superior performance, with up to 38 times faster inference speed compared to models like MobileFormer.

Technical Depth

The architecture of MobileOne incorporates re-parameterizable branches that enhance representation capacity without incurring high latency costs at inference. By dynamically adjusting architectural components and utilizing a systematic model scaling strategy, MobileOne effectively balances depth and width to optimize resource usage and performance.

Furthermore, the authors address training inefficiencies by employing a novel strategy of progressively relaxing regularization, allowing smaller models to avoid overfitting while still capitalizing on robust optimization methods.

Strong Numerical Results

MobileOne-S1, with 4.8 million parameters, achieves a latency of 0.89 ms while outperforming well-established models like MobileNetV2 in both accuracy and speed. Moreover, it offers an impressive 3.9% improvement in top-1 accuracy compared to models with similar parameter counts.

Implications and Speculation

This research has significant implications for real-time applications on mobile devices, particularly where rapid inference and energy efficiency are critical. The MobileOne backbone could influence future developments in mobile AI by serving as a foundation for more specialized applications, such as augmented reality or real-time translation services, where latency is paramount.

Looking forward, the proposed re-parameterization strategy may inspire further exploration into optimizing different classes of models beyond mobile networks, potentially improving efficiency in high-performance computing environments as well.

In conclusion, MobileOne sets a new benchmark for mobile-friendly neural networks by harmonizing speed and accuracy. The framework's adaptability across a range of tasks underscores its potential to drive innovation across diverse AI applications.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Pavan Kumar Anasosalu Vasu (11 papers)
  2. James Gabriel (4 papers)
  3. Jeff Zhu (2 papers)
  4. Oncel Tuzel (62 papers)
  5. Anurag Ranjan (27 papers)
Citations (122)
Youtube Logo Streamline Icon: https://streamlinehq.com