- The paper introduces an attentive sampling strategy in NAS that prioritizes the Pareto front to yield high-quality architectures.
- The methodology leverages BestUp and WorstUp techniques to refine training and optimize trade-offs between accuracy and computational cost.
- Empirical results on ImageNet demonstrate top-1 accuracies between 77.3% and 80.7% at lower computational costs, ideal for deployment on edge devices.
AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling
The paper "AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling" addresses the challenge associated with Neural Architecture Search (NAS) of designing accurate and efficient deep neural networks (DNNs) suitable for deployment in resource-constrained environments, such as edge devices. More specifically, this paper proposes AttentiveNAS, an approach that augments traditional NAS methods by adopting a more discriminating sampling strategy that emphasizes the Pareto front of model performance, thus enhancing both search efficiency and the quality of architectures found.
Context and Methodology
Recent advancements in two-stage NAS frameworks aim to separate model parameter training and architecture optimization into distinct phases. These frameworks, such as BigNAS and Once-for-All (OFA), utilize a weight-sharing strategy which consolidates the training of numerous candidate networks into a single, unified model during the training phase. This approach notably reduces the computational expense compared to previous methods like reinforcement learning or evolutionary algorithms, which can necessitate training thousands of models independently.
The primary contribution of AttentiveNAS lies in its attentive sampling strategy during the constraint-free pre-training stage. Traditionally, uniform sampling is used to select candidate architectures during training, ensuring each candidate is considered with equal significance. However, such an approach does not efficiently target the Pareto front — the set of network architectures balancing accuracy against computational expenditure. AttentiveNAS seeks to improve this process by prioritizing samples that are more likely to contribute to a better Pareto front, using methods they describe as BestUp and WorstUp.
- BestUp prioritizes models that represent the current best performance-efficiency trade-offs, reinforcing existing strengths in the sampled architectures.
- WorstUp emphasizes models with the poorest trade-offs, motivated by the notion that addressing these underperforming models can lead to a well-optimized parameter space, thus benefiting the entire shared network.
Numerical Results and Impact
AttentiveNAS’s approach results in several compelling numerical outcomes. The models discovered by AttentiveNAS outperform existing state-of-the-art (SOTA) models on the ImageNet dataset, achieving top-1 accuracy ranging from 77.3\% to 80.7\%, which is a marked improvement over previous methods such as MobileNetV3 and EvenNet-B0. Notably, with only 491 MFLOPs, AttentiveNAS-A5 achieves an 80.1\% accuracy, showcasing its potential for high efficiency and effectiveness.
Implications and Future Directions
The practical implications of AttentiveNAS are significant for the deployment of DNNs on edge devices, where computational resources are constrained, and efficiency is crucial. By refining the sampling process in the NAS training phase, AttentiveNAS ensures that models edge closer to the performance of SOTA solutions with reduced computational burdens.
Theoretically, the proposal to move away from uniform sampling toward more informed strategies that consider the architectural performance Pareto has substantial merit. It underscores the importance of tailored search strategies in NAS research that align closely with targeted outcomes, such as balancing computational cost against accuracy.
Future research inspired by AttentiveNAS is likely to delve into its generalizability across various datasets and resource constraints and to explore its adaptability in conjunction with different NAS paradigms that may include hardware awareness or additional constraints such as energy efficiency.
In conclusion, this paper exemplifies the ongoing progress in NAS methodologies, showcasing the potential improvements that can be garnered through attentive sampling. AttentiveNAS could potentially guide future developments in applying NAS to real-world tasks, especially in constrained environments, broadening the accessibility and deployment of advanced DNN architectures.