Efficient Architecture Search by Network Transformation (1707.04873v2)

Published 16 Jul 2017 in cs.LG and cs.AI

Abstract: Techniques for automatically designing deep neural network architectures such as reinforcement learning based approaches have recently shown promising results. However, their success is based on vast computational resources (e.g. hundreds of GPUs), making them difficult to be widely used. A noticeable limitation is that they still design and train each network from scratch during the exploration of the architecture space, which is highly inefficient. In this paper, we propose a new framework toward efficient architecture search by exploring the architecture space based on the current network and reusing its weights. We employ a reinforcement learning agent as the meta-controller, whose action is to grow the network depth or layer width with function-preserving transformations. As such, the previously validated networks can be reused for further exploration, thus saves a large amount of computational cost. We apply our method to explore the architecture space of the plain convolutional neural networks (no skip-connections, branching etc.) on image benchmark datasets (CIFAR-10, SVHN) with restricted computational resources (5 GPUs). Our method can design highly competitive networks that outperform existing networks using the same design scheme. On CIFAR-10, our model without skip-connections achieves 4.23\% test error rate, exceeding a vast majority of modern architectures and approaching DenseNet. Furthermore, by applying our method to explore the DenseNet architecture space, we are able to achieve more accurate networks with fewer parameters.

Authors (5)

Han Cai (79 papers)
Tianyao Chen (6 papers)
Weinan Zhang (322 papers)
Yong Yu (219 papers)
Jun Wang (991 papers)

Citations (67)

View on Semantic Scholar

Summary

Efficient Architecture Search by Network Transformation

The paper "Efficient Architecture Search by Network Transformation" presents a novel framework named Efficient Architecture Search (EAS) aiming to optimize the process of neural network architecture search. Unlike traditional methods that require substantial computational resources, often involving training networks from scratch on numerous GPUs, EAS proposes using network transformations and weight reuse to explore architecture spaces efficiently while utilizing only limited hardware.

Key Contributions

EAS innovatively employs a reinforcement learning (RL) based meta-controller to automate the neural network architecture search. The controller's actions include network transformations such as increasing layer width or depth while preserving the network function, allowing the reuse of existing network weights and thereby reducing the computational cost typically associated with architectural searches. By leveraging function-preserving transformations, EAS effectively accelerates the architectural design process and enhances the development of competitive network models.

Methodology

The meta-controller utilizes an RL agent that iteratively modifies the existing network architecture by applying transformations such as Net2WiderNet and Net2DeeperNet operations. The Net2Wider operation adjusts a layer to have more units or filters, while Net2Deeper involves inserting additional layers initialized to preserve the existing function. Such transformations capitalize on pretrained weights—essentially treating them as knowledge transfer—yielding more efficient architectural exploration and faster convergence during retraining.

The application domain of EAS includes experimenting with plain CNN architectures and DenseNet architectures, focusing on popular image benchmark datasets such as CIFAR-10 and SVHN. The research demonstrates that on CIFAR-10, the models designed using EAS achieve test error rates approaching modern architectures like DenseNet, while utilizing fewer parameters.

Results and Implications

The paper reports significant advancements in reducing test error rates on image classification tasks. For the CIFAR-10 dataset, EAS designed a plain CNN architecture achieving a test error rate of 4.23%, outperforming many existing models crafted using substantial computational resources. On DenseNet architectures, EAS attained a test error rate of 3.44% on CIFAR-10 with data augmentation, underscoring the framework's capability to efficiently optimize complex networks with fewer parameters.

These results reveal the practical implications of EAS as a viable method for organizations and researchers with limited access to computational power, facilitating efficient architecture search without compromising performance. This efficiency enhances the accessibility of automated architecture design tools, democratizing advanced neural network model development.

Future Directions

The paper speculates potential future directions, including incorporating more diverse network transformation operations and investigating designs focusing on balancing model size and accuracy. These extensions could further optimize neural network architectures for specific constraints or objectives beyond mere performance metrics.

Importantly, EAS suggests a paradigm shift in automated architecture search strategies, positioning itself as a noteworthy solution to the computational challenges faced by smaller research entities. As the AI community advances, EAS could play an integral role in developing cost-efficient, high-performance AI solutions tailored for unique application needs.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos