Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Neural Architecture Transfer (2005.05859v2)

Published 12 May 2020 in cs.CV, cs.LG, and cs.NE

Abstract: Neural architecture search (NAS) has emerged as a promising avenue for automatically designing task-specific neural networks. Existing NAS approaches require one complete search for each deployment specification of hardware or objective. This is a computationally impractical endeavor given the potentially large number of application scenarios. In this paper, we propose Neural Architecture Transfer (NAT) to overcome this limitation. NAT is designed to efficiently generate task-specific custom models that are competitive under multiple conflicting objectives. To realize this goal we learn task-specific supernets from which specialized subnets can be sampled without any additional training. The key to our approach is an integrated online transfer learning and many-objective evolutionary search procedure. A pre-trained supernet is iteratively adapted while simultaneously searching for task-specific subnets. We demonstrate the efficacy of NAT on 11 benchmark image classification tasks ranging from large-scale multi-class to small-scale fine-grained datasets. In all cases, including ImageNet, NATNets improve upon the state-of-the-art under mobile settings ($\leq$ 600M Multiply-Adds). Surprisingly, small-scale fine-grained datasets benefit the most from NAT. At the same time, the architecture search and transfer is orders of magnitude more efficient than existing NAS methods. Overall, the experimental evaluation indicates that, across diverse image classification tasks and computational objectives, NAT is an appreciably more effective alternative to conventional transfer learning of fine-tuning weights of an existing network architecture learned on standard datasets. Code is available at https://github.com/human-analysis/neural-architecture-transfer

Citations (139)

Summary

  • The paper presents NAT, which reduces NAS computational demands by leveraging pre-trained supernets and a many-objective evolutionary algorithm.
  • It employs an online surrogate model to predict subnet accuracy, enabling a balanced search between performance, efficiency, and hardware constraints.
  • Experimental results on 11 benchmarks demonstrate that task-specific NAT models outperform conventional transfer learning, especially on fine-grained image datasets.

An Overview of Neural Architecture Transfer

The paper "Neural Architecture Transfer" centers on an innovative method termed Neural Architecture Transfer (NAT) to address the complexities inherent in Neural Architecture Search (NAS). In essence, NAT seeks to bridge the gap between the computationally intensive demands of NAS with the need for efficient, task-specific neural networks across varying deployment settings.

Neural Architecture Search has established itself as a formidable tool for automate the creation of high-performance deep learning architectures tailored for specific tasks. NAS, however, often requires exhaustive computational resources, as each deployment scenario necessitates a unique search and optimization process for both architecture and hyperparameters. The authors of this paper propose NAT as a cost-effective resolution—leveraging pre-trained supernets and a many-objective evolutionary algorithm to efficiently customize subnets suited to new tasks without the substantial overhead typically associated with NAS.

Key Components of Neural Architecture Transfer

At its core, the NAT approach is composed of three pivotal elements:

  1. Supernet Structure: A supernet serves as a comprehensive model encapsulating a vast array of possible subnet architectures. NAT utilizes this structure by sampling from the supernet to adapt and fine-tune specific architectures to new tasks.
  2. Accuracy Prediction and Evolutionary Search: Within this framework, NAT employs an online surrogate model to predict the accuracy of subnets, refined using weight sharing. An evolutionary search method manages the selection and optimization of architectures by systematically exploring the trade-offs between multiple objectives such as predictive performance, computational complexity, and model size.
  3. Integrated Optimization: Importantly, the NAT model iterates on its search process by continuously adapting the supernet to more promising parts of the architecture search space, guided by the results of the evolutionary algorithm. This allows NAT to effectively balance the intricate trade-offs among conflicting objectives like accuracy, efficiency, and hardware constraints.

Experimental Validation and Implications

The research validates the efficacy of NAT across eleven benchmark image classification tasks. An important finding from the experiments is that the use of task-specific NATNets yielded performance improvements notable on fine-grained and lower-scale datasets compared to conventional transfer learning methods that rely solely on fine-tuning. These results underscore the value of task-specific model customization, particularly in datasets where transferring architectures directly does not yield optimal outcomes.

Specifically, the results on ImageNet demonstrate that architectures derived through NAT not only surpass existing counterparts in terms of accuracy but also reconcile the demand for efficiency, fitting within the computational constraints of mobile settings (≤ 600M Multiply-Adds). NAT's effectiveness extends to optimization scenarios characterized by more than two objectives, showcasing its scalability in designing neural networks that meet diverse deployment conditions.

Conclusion and Future Prospects

In conclusion, this paper introduces a compelling framework for Neural Architecture Transfer that optimally leverages the power of pre-trained supernets and many-objective evolutionary algorithms. This broadens the accessibility of NAS technologies, enabling efficient task-specific network design without necessitating the extensive computational expenditure usually associated with conventional approaches. Future developments could see further enhancements in NAT's surrogate predictive models and the exploration of its applicability across a more extensive range of machine learning tasks beyond image classification. The promising results presented herein pave the way for continued advancement in automated machine learning, fostering neural architectures that are increasingly bespoke, flexible, and efficient.