Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Navigating Efficiency in MobileViT through Gaussian Process on Global Architecture Factors (2406.04820v1)

Published 7 Jun 2024 in cs.CV and cs.AI

Abstract: Numerous techniques have been meticulously designed to achieve optimal architectures for convolutional neural networks (CNNs), yet a comparable focus on vision transformers (ViTs) has been somewhat lacking. Despite the remarkable success of ViTs in various vision tasks, their heavyweight nature presents challenges of computational costs. In this paper, we leverage the Gaussian process to systematically explore the nonlinear and uncertain relationship between performance and global architecture factors of MobileViT, such as resolution, width, and depth including the depth of in-verted residual blocks and the depth of ViT blocks, and joint factors including resolution-depth and resolution-width. We present design principles twisting magic 4D cube of the global architecture factors that minimize model sizes and computational costs with higher model accuracy. We introduce a formula for downsizing architectures by iteratively deriving smaller MobileViT V2, all while adhering to a specified constraint of multiply-accumulate operations (MACs). Experiment results show that our formula significantly outperforms CNNs and mobile ViTs across diversified datasets

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Ke Meng (8 papers)
  2. Kai Chen (512 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets