Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling (2011.09011v2)

Published 18 Nov 2020 in cs.CV and cs.LG

Abstract: Neural architecture search (NAS) has shown great promise in designing state-of-the-art (SOTA) models that are both accurate and efficient. Recently, two-stage NAS, e.g. BigNAS, decouples the model training and searching process and achieves remarkable search efficiency and accuracy. Two-stage NAS requires sampling from the search space during training, which directly impacts the accuracy of the final searched models. While uniform sampling has been widely used for its simplicity, it is agnostic of the model performance Pareto front, which is the main focus in the search process, and thus, misses opportunities to further improve the model accuracy. In this work, we propose AttentiveNAS that focuses on improving the sampling strategy to achieve better performance Pareto. We also propose algorithms to efficiently and effectively identify the networks on the Pareto during training. Without extra re-training or post-processing, we can simultaneously obtain a large number of networks across a wide range of FLOPs. Our discovered model family, AttentiveNAS models, achieves top-1 accuracy from 77.3% to 80.7% on ImageNet, and outperforms SOTA models, including BigNAS and Once-for-All networks. We also achieve ImageNet accuracy of 80.1% with only 491 MFLOPs. Our training code and pretrained models are available at https://github.com/facebookresearch/AttentiveNAS.

Citations (95)

Summary

  • The paper introduces an attentive sampling strategy in NAS that prioritizes the Pareto front to yield high-quality architectures.
  • The methodology leverages BestUp and WorstUp techniques to refine training and optimize trade-offs between accuracy and computational cost.
  • Empirical results on ImageNet demonstrate top-1 accuracies between 77.3% and 80.7% at lower computational costs, ideal for deployment on edge devices.

AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling

The paper "AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling" addresses the challenge associated with Neural Architecture Search (NAS) of designing accurate and efficient deep neural networks (DNNs) suitable for deployment in resource-constrained environments, such as edge devices. More specifically, this paper proposes AttentiveNAS, an approach that augments traditional NAS methods by adopting a more discriminating sampling strategy that emphasizes the Pareto front of model performance, thus enhancing both search efficiency and the quality of architectures found.

Context and Methodology

Recent advancements in two-stage NAS frameworks aim to separate model parameter training and architecture optimization into distinct phases. These frameworks, such as BigNAS and Once-for-All (OFA), utilize a weight-sharing strategy which consolidates the training of numerous candidate networks into a single, unified model during the training phase. This approach notably reduces the computational expense compared to previous methods like reinforcement learning or evolutionary algorithms, which can necessitate training thousands of models independently.

The primary contribution of AttentiveNAS lies in its attentive sampling strategy during the constraint-free pre-training stage. Traditionally, uniform sampling is used to select candidate architectures during training, ensuring each candidate is considered with equal significance. However, such an approach does not efficiently target the Pareto front — the set of network architectures balancing accuracy against computational expenditure. AttentiveNAS seeks to improve this process by prioritizing samples that are more likely to contribute to a better Pareto front, using methods they describe as BestUp and WorstUp.

  • BestUp prioritizes models that represent the current best performance-efficiency trade-offs, reinforcing existing strengths in the sampled architectures.
  • WorstUp emphasizes models with the poorest trade-offs, motivated by the notion that addressing these underperforming models can lead to a well-optimized parameter space, thus benefiting the entire shared network.

Numerical Results and Impact

AttentiveNAS’s approach results in several compelling numerical outcomes. The models discovered by AttentiveNAS outperform existing state-of-the-art (SOTA) models on the ImageNet dataset, achieving top-1 accuracy ranging from 77.3\% to 80.7\%, which is a marked improvement over previous methods such as MobileNetV3 and EvenNet-B0. Notably, with only 491 MFLOPs, AttentiveNAS-A5 achieves an 80.1\% accuracy, showcasing its potential for high efficiency and effectiveness.

Implications and Future Directions

The practical implications of AttentiveNAS are significant for the deployment of DNNs on edge devices, where computational resources are constrained, and efficiency is crucial. By refining the sampling process in the NAS training phase, AttentiveNAS ensures that models edge closer to the performance of SOTA solutions with reduced computational burdens.

Theoretically, the proposal to move away from uniform sampling toward more informed strategies that consider the architectural performance Pareto has substantial merit. It underscores the importance of tailored search strategies in NAS research that align closely with targeted outcomes, such as balancing computational cost against accuracy.

Future research inspired by AttentiveNAS is likely to delve into its generalizability across various datasets and resource constraints and to explore its adaptability in conjunction with different NAS paradigms that may include hardware awareness or additional constraints such as energy efficiency.

In conclusion, this paper exemplifies the ongoing progress in NAS methodologies, showcasing the potential improvements that can be garnered through attentive sampling. AttentiveNAS could potentially guide future developments in applying NAS to real-world tasks, especially in constrained environments, broadening the accessibility and deployment of advanced DNN architectures.