Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware (1812.00332v2)

Published 2 Dec 2018 in cs.LG, cs.CV, and stat.ML

Abstract: Neural architecture search (NAS) has a great impact by automatically designing effective neural network architectures. However, the prohibitive computational demand of conventional NAS algorithms (e.g. $104$ GPU hours) makes it difficult to \emph{directly} search the architectures on large-scale tasks (e.g. ImageNet). Differentiable NAS can reduce the cost of GPU hours via a continuous representation of network architecture but suffers from the high GPU memory consumption issue (grow linearly w.r.t. candidate set size). As a result, they need to utilize~\emph{proxy} tasks, such as training on a smaller dataset, or learning with only a few blocks, or training just for a few epochs. These architectures optimized on proxy tasks are not guaranteed to be optimal on the target task. In this paper, we present \emph{ProxylessNAS} that can \emph{directly} learn the architectures for large-scale target tasks and target hardware platforms. We address the high memory consumption issue of differentiable NAS and reduce the computational cost (GPU hours and GPU memory) to the same level of regular training while still allowing a large candidate set. Experiments on CIFAR-10 and ImageNet demonstrate the effectiveness of directness and specialization. On CIFAR-10, our model achieves 2.08\% test error with only 5.7M parameters, better than the previous state-of-the-art architecture AmoebaNet-B, while using 6$\times$ fewer parameters. On ImageNet, our model achieves 3.1\% better top-1 accuracy than MobileNetV2, while being 1.2$\times$ faster with measured GPU latency. We also apply ProxylessNAS to specialize neural architectures for hardware with direct hardware metrics (e.g. latency) and provide insights for efficient CNN architecture design.

ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware

Neural architecture search (NAS) has been a significant catalyst in advancing automatic neural network design for diverse deep learning tasks. However, traditional NAS approaches involve an exorbitant computational cost, rendering direct applications on large-scale datasets like ImageNet infeasible. Often, these methods rely on proxy tasks, which are smaller or less complex tasks acting as stand-ins for the target tasks. Unfortunately, the architectures optimized on these proxies do not always translate effectively to large-scale tasks, primarily when considering specific hardware constraints like latency.

The paper "ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware" by Han Cai, Ligeng Zhu, and Song Han introduces ProxylessNAS, an innovative approach towards NAS that circumvents the limitations of proxy tasks and allows for direct searching on target tasks and hardware.

Methodology and Contributions

ProxylessNAS tackles the burden of high computational costs associated with NAS through a combination of strategies aimed at memory and computational efficiency. The method proposes solutions that enable a direct search on large-scale tasks without resorting to proxies by significantly reducing both the GPU hours required and the memory consumption. The key innovations introduced in this work include:

  1. Path-Level Pruning: The technique formulates the NAS problem as a path-level pruning process in an over-parameterized network. By integrating architectural parameters to learn which paths (or operations) are redundant, ProxylessNAS prunes these paths at the end of training, thus deriving a compact and optimized neural network architecture. This approach necessitates training a single, albeit over-parameterized, model only once, rather than multiple models as previous NAS approaches did.
  2. Binary Path Learning: To counteract the memory explosion resulting from evaluating all possible paths simultaneously, the authors propose binarizing the architectural parameters. For each batch, only one path is active, minimizing the memory usage to that of a compact model. This binarization effectively compresses the search space, maintaining computational feasibility even with a large candidate set.
  3. Hardware-Aware NAS: ProxylessNAS incorporates hardware metrics, particularly latency, directly into the optimization objective. By modeling latency as a continuous function of network dimensions, ProxylessNAS includes a latency regularization term in the loss function. This process allows for gradient-based optimization even with non-differentiable objectives like latency, facilitating the search for specialized architectures tailored to specific hardware platforms (e.g., CPU, GPU, mobile).

Experimental Results

CIFAR-10

On CIFAR-10, ProxylessNAS demonstrates its efficacy by achieving a state-of-the-art test error rate of 2.08% with only 5.7 million parameters, outperforming previous methods like AmoebaNet-B which required six times more parameters. This validates the strength of ProxylessNAS in terms of model efficiency and performance.

ImageNet

On ImageNet, ProxylessNAS further showcases its potential by achieving substantial improvements under latency constraints for different hardware platforms:

  • Mobile Devices:

    The models designed using ProxylessNAS under mobile latency constraints achieve a 74.6% top-1 accuracy, surpassing both MobileNetV2 and MnasNet while being significantly faster.

  • GPUs:

    The GPU-optimized model achieves 75.1% top-1 accuracy at 1.2x latency performance compared to MobileNetV2.

  • CPUs:

    The CPU-optimized model delivers superior latency performance specific to CPU hardware constraints, illustrating the flexibility and efficacy of ProxylessNAS in different environments.

Insights and Implications

The empirical successes of ProxylessNAS highlight several important implications for future research and practical applications:

  • Direct NAS:

    By eliminating the need for proxy tasks, ProxylessNAS opens new avenues for applying NAS directly to large-scale and complex tasks, ensuring architectures are better aligned with target task requirements.

  • Hardware Specialization:

    The ability to search for hardware-specific architectures amplifies the impact of NAS in optimizing the performance of neural networks across diverse deployment scenarios. This is especially critical given the heterogeneous nature of hardware environments, from mobile devices to high-performance GPUs.

  • Efficiency:

    ProxylessNAS represents a significant leap towards resource-efficient NAS methodologies, achieving competitive results with drastically reduced computational costs.

Future Developments

Future developments inspired by ProxylessNAS could explore broader architectural spaces and leverage more sophisticated latency models to further refine hardware-aware NAS. Additionally, the efficient path selection mechanism in ProxylessNAS might be extended to other domains, such as LLMs or reinforcement learning tasks, thereby broadening its applicability.

In conclusion, ProxylessNAS sets a pivotal precedent in the landscape of NAS by delivering a practical, efficient, and hardware-aware methodology that promises to transform how neural architectures are designed and optimized in real-world applications.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Han Cai (79 papers)
  2. Ligeng Zhu (22 papers)
  3. Song Han (155 papers)
Citations (1,782)