Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MoGA: Searching Beyond MobileNetV3 (1908.01314v4)

Published 4 Aug 2019 in cs.LG, cs.CV, cs.NE, and stat.ML

Abstract: The evolution of MobileNets has laid a solid foundation for neural network applications on mobile end. With the latest MobileNetV3, neural architecture search again claimed its supremacy in network design. Unfortunately, till today all mobile methods mainly focus on CPU latencies instead of GPU, the latter, however, is much preferred in practice for it has faster speed, lower overhead and less interference. Bearing the target hardware in mind, we propose the first Mobile GPU-Aware (MoGA) neural architecture search in order to be precisely tailored for real-world applications. Further, the ultimate objective to devise a mobile network lies in achieving better performance by maximizing the utilization of bounded resources. Urging higher capability while restraining time consumption is not reconcilable. We alleviate the tension by weighted evolution techniques. Moreover, we encourage increasing the number of parameters for higher representational power. With 200x fewer GPU days than MnasNet, we obtain a series of models that outperform MobileNetV3 under the similar latency constraints, i.e., MoGA-A achieves 75.9% top-1 accuracy on ImageNet, MoGA-B meets 75.5% which costs only 0.5 ms more on mobile GPU. MoGA-C best attests GPU-awareness by reaching 75.3% and being slower on CPU but faster on GPU.The models and test code is made available here https://github.com/xiaomi-automl/MoGA.

Citations (41)

Summary

  • The paper introduces MoGA, a GPU-aware neural architecture search framework that directly targets mobile GPU latency over traditional CPU-focused methods.
  • The authors implement a weighted evolution strategy to balance accuracy, latency, and model size, reducing search cost to just 12 GPU days.
  • MoGA-generated models achieve up to 75.9% top-1 ImageNet accuracy with an 11.1 ms mobile GPU latency, marking a significant improvement over MobileNetV3.

Overview of "MoGA: Searching Beyond MobileNetV3"

The paper "MoGA: Searching Beyond MobileNetV3" by Xiangxiang Chu, Bo Zhang, and Ruijun Xu presents a novel approach to neural architecture search (NAS) explicitly designed for mobile GPUs, which addresses specific hardware characteristics that are not adequately captured when considering only CPU performance. Unlike previous methods that primarily focus on mobile CPUs, this research introduces the Mobile GPU-Aware (MoGA) NAS framework. This approach offers a significant advancement in optimizing architectures for GPU latency, aiming to maximize the utilization of bounded resources specific to mobile GPUs.

The authors underscore the real-world preference for deploying mobile applications on GPUs rather than CPUs due to factors like speed, overhead, and interference. Their work carefully considers the implications of this shift, highlighting that the relationship between CPU and GPU latencies is not straightforward. Consequently, the authors argue for a GPU-targeted NAS strategy that is more aligned with practical deployment scenarios.

Key Contributions

  1. GPU-Aware Optimization: The paper emphasizes the significance of tailoring neural architecture to specific hardware characteristics, in this case, mobile GPUs. The research exemplifies this through the introduction of MoGA, which integrates GPU latency into the NAS process.
  2. Weighted Evolution Approach: Instead of a traditional multi-objective optimization, the authors employ a weighted evolution technique that balances accuracy, latency, and the number of parameters with a preference for accuracy and latency over the number of parameters. This approach mitigates the trade-offs between these conflicting objectives.
  3. Efficiency in Search Cost: The authors demonstrate an efficient NAS process by significantly reducing the search cost to 12 GPU days, remarkably lower than alternative methods like MnasNet. The amortization of supernet training across different devices emphasizes the scalability of their approach.
  4. Superior Performance: The models generated using MoGA outperform MobileNetV3 under comparable latency conditions. Notably, MoGA-A reaches 75.9% top-1 accuracy on the ImageNet dataset, showcasing the potential for improved representational power.

Numerical Results

  • MoGA-A achieves a top-1 accuracy of 75.9% on ImageNet with a mobile GPU latency of 11.1 ms, highlighting an improvement over MobileNetV3.
  • MoGA-B and MoGA-C also demonstrate strong performance, with top-1 accuracies of 75.5% and 75.3%, respectively.
  • The paper documents a significant cost efficiency, with 200 times fewer GPU days required than MnasNet.

Theoretical and Practical Implications

The implications of this research are twofold:

  1. Theoretical: It challenges the conventional focus on CPU optimization in mobile NAS and emphasizes the need for hardware-specific optimization, particularly for GPUs, which may be underutilized in traditional settings.
  2. Practical: By optimizing for mobile GPUs, MoGA prepares neural networks that are more aligned with the computational characteristics users typically encounter in production. The framework's ability to generate models with higher accuracy and efficiency can directly enhance on-device AI applications, such as image recognition and real-time processing tasks.

Future Directions

Future research could further explore refining architecture diversity while expanding the search space, ensuring that NAS solutions remain scalable and adaptable to a broader range of devices. Continued emphasis on framework-specific solutions may also emerge, as different compute units and hardware platforms introduce varied performance constraints and opportunities.

In summary, MoGA presents a significant advancement in mobile NAS by aligning neural architecture design with GPU-specific characteristics, thereby enhancing both the theoretical understanding and practical deployment of AI on mobile devices. This research positions itself as a pivotal contribution to the optimization and utilization of neural networks in the context of increasingly prevalent mobile GPU applications.

Github Logo Streamline Icon: https://streamlinehq.com