- The paper presents ENAS, which expedites neural architecture search by sharing parameters across child models, reducing computational cost by over 1000x.
- ENAS uses an LSTM controller to sample subgraphs from a larger DAG, efficiently designing both recurrent and convolutional networks.
- Empirical results on Penn Treebank and CIFAR-10 demonstrate improved performance metrics with lower error rates and test perplexity compared to traditional methods.
Efficient Neural Architecture Search via Parameter Sharing
The paper presents Efficient Neural Architecture Search (ENAS), an innovative method aimed at expediting the Neural Architecture Search (NAS) process, which has historically been computationally expensive. The primary advancement detailed in the paper is the parameter sharing mechanism across child models during the architecture search, which starkly contrasts traditional approaches where each model is trained independently.
Central to ENAS is the observation that potential neural network architectures can be viewed as subgraphs within a larger Directed Acyclic Graph (DAG). This overarching graph encapsulates all possible configurations within the designated search space. By employing a controller, specifically an LSTM, that samples these subgraphs, ENAS efficiently narrows down the optimal architectures through policy gradient training to maximize validation set performance.
Methodology
In ENAS, the search space is represented as a DAG. Each node in the DAG signifies a local computation, while the edges represent the flow of information. The LSTM controller navigates this DAG, making decisions on operations and connections, ultimately crafting network architectures.
Recurrent Cells
For designing recurrent cells, ENAS uses a DAG with N nodes. The LSTM controller samples from this space, specifying:
- Previous nodes.
- Activation functions.
This dynamic sampling enables the design of complex, flexible RNN architectures beyond the constraints of pre-fixed structures such as binary trees. The benefit of this flexibility is evidenced in the empirical performance improvements seen in the results.
Convolutional Networks
For convolutional architectures, ENAS employs two primary search spaces:
- Macro - This entails designing the entire network.
- Micro - This involves designing smaller convolutional and reduction cells, which are then composed to form the complete architecture.
Experimental Results
ENAS demonstrated strong empirical performance across distinct tasks such as LLMing and image classification, significantly reducing computational resources in the process.
Penn Treebank
ENAS was applied to design a recurrent cell for LLMing on the Penn Treebank dataset. The resulting architecture achieved a state-of-the-art test perplexity of 55.8, outperforming the NAS approach by a significant margin while utilizing over 1000x fewer GPU-hours.
CIFAR-10
For image classification on CIFAR-10, ENAS was tested in both its macro and micro search spaces. In the macro space, ENAS achieved a test error of 4.23%, and 3.87% when the number of filters was increased. In the micro space, ENAS achieved an error rate of 3.54%, with further reduction to 2.89% when using the CutOut data augmentation method. These results are comparable to those of state-of-the-art manually designed architectures and other NAS approaches, but come at a fraction of the computational cost.
Implications and Future Directions
ENAS has profound implications for the field of automated model design. By drastically reducing the computational resources required for NAS, ENAS democratizes access to architecture search, making it feasible for broader use beyond large-scale industrial applications. The demonstrated efficiency and effectiveness of parameter sharing across child models also open up further avenues for optimizing search processes in other domains and tasks.
From a theoretical standpoint, ENAS challenges the traditional notion that each candidate model in NAS must be independently trained to performance convergence. This paradigm shift could inspire new methodologies that further leverage shared computations and representations.
In conclusion, ENAS stands as a significant contribution to the efficient design of neural architectures, bridging the gap between nascent theoretical ideas and their practical, large-scale application. Future work may explore more sophisticated controller mechanisms or holistic integration with other meta-learning techniques, potentially further enhancing the performance and efficiency of NAS.