Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Neural Architecture Optimization (1808.07233v5)

Published 22 Aug 2018 in cs.LG and stat.ML

Abstract: Automatic neural architecture design has shown its potential in discovering powerful neural network architectures. Existing methods, no matter based on reinforcement learning or evolutionary algorithms (EA), conduct architecture search in a discrete space, which is highly inefficient. In this paper, we propose a simple and efficient method to automatic neural architecture design based on continuous optimization. We call this new approach neural architecture optimization (NAO). There are three key components in our proposed approach: (1) An encoder embeds/maps neural network architectures into a continuous space. (2) A predictor takes the continuous representation of a network as input and predicts its accuracy. (3) A decoder maps a continuous representation of a network back to its architecture. The performance predictor and the encoder enable us to perform gradient based optimization in the continuous space to find the embedding of a new architecture with potentially better accuracy. Such a better embedding is then decoded to a network by the decoder. Experiments show that the architecture discovered by our method is very competitive for image classification task on CIFAR-10 and LLMing task on PTB, outperforming or on par with the best results of previous architecture search methods with a significantly reduction of computational resources. Specifically we obtain 1.93% test set error rate for CIFAR-10 image classification task and 56.0 test set perplexity of PTB LLMing task. Furthermore, combined with the recent proposed weight sharing mechanism, we discover powerful architecture on CIFAR-10 (with error rate 2.93%) and on PTB (with test set perplexity 56.6), with very limited computational resources (less than 10 GPU hours) for both tasks.

Neural Architecture Optimization: A Continuous Approach to Neural Network Design

The paper “Neural Architecture Optimization” presents a novel method for automatic neural architecture design, addressing inefficiencies in traditional architecture search methods, such as reinforcement learning (RL) and evolutionary algorithms (EA), which operate within discrete spaces. The authors propose an innovative approach, termed Neural Architecture Optimization (NAO), which leverages continuous optimization to streamline the search for effective neural network architectures.

Key Components of NAO

NAO operates through three critical components:

  1. Encoder: This component maps discrete neural network architectures into a continuous vector space. This mapping aims to represent the topological details of a network in a compact and efficient manner, akin to distributed representations in natural language processing.
  2. Predictor: In this step, the continuous representation produced by the encoder is used to predict the performance of network architectures. The predictor enables gradient-based optimization by guiding the search towards architectures with potentially superior accuracy.
  3. Decoder: This element converts the continuous representation back into a discrete neural network architecture. Through optimization, NAO identifies new architecture embeddings, which are then decoded to construct networks predicted to perform better.

Results and Implications

The approach's empirical evaluation spans image classification on the CIFAR-10 dataset and LLMing with the Penn Treebank (PTB) dataset. For CIFAR-10, NAO achieves a test error rate of 1.93%, and it demonstrates competitive results in LLMing, with a PTB test perplexity of 56.0. These results are significant as they outperform or match the best outcomes of previous search methods while demanding considerably fewer computational resources.

Additionally, the architectures discovered on both tasks prove transferrable to other datasets, such as CIFAR-100, ImageNet, and WikiText-2, maintaining high performance levels. These findings underscore the robustness and generality of NAO-discovered architectures.

Theoretical and Practical Implications

Theoretically, the shift from discrete to continuous optimization spaces may offer new perspectives in neural architecture search, paving the way for more effective exploration of potential architectures. The continuous space provides a smoother landscape, facilitating gradient-based methods that can efficiently navigate towards high-performing solutions.

Practically, the significant reduction in computational resources required by NAO makes it accessible for broader applications and more frequent experimentation. This aspect is particularly beneficial for researchers and practitioners with limited computing capacities.

Future Directions

Looking ahead, potential advancements could explore the utility of NAO across various domains, including natural language processing and speech recognition. Additionally, integrating NAO with other optimization techniques like Bayesian optimization might further refine its efficiency and effectiveness. The research community could also investigate the potential of NAO in designing architectures capable of real-time adaptation to evolving data and tasks.

In summary, NAO stands as a promising contribution to the field of neural architecture search, offering a method that balances effectiveness and efficiency without overspending computational resources. This paper represents a step forward in automating the design of neural network architectures, potentially influencing both academic research and practical applications in artificial intelligence.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Renqian Luo (19 papers)
  2. Fei Tian (32 papers)
  3. Tao Qin (201 papers)
  4. Enhong Chen (242 papers)
  5. Tie-Yan Liu (242 papers)
Citations (618)
Youtube Logo Streamline Icon: https://streamlinehq.com