Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Graph HyperNetworks for Neural Architecture Search (1810.05749v3)

Published 12 Oct 2018 in cs.LG, cs.CV, and stat.ML

Abstract: Neural architecture search (NAS) automatically finds the best task-specific neural network topology, outperforming many manual architecture designs. However, it can be prohibitively expensive as the search requires training thousands of different networks, while each can last for hours. In this work, we propose the Graph HyperNetwork (GHN) to amortize the search cost: given an architecture, it directly generates the weights by running inference on a graph neural network. GHNs model the topology of an architecture and therefore can predict network performance more accurately than regular hypernetworks and premature early stopping. To perform NAS, we randomly sample architectures and use the validation accuracy of networks with GHN generated weights as the surrogate search signal. GHNs are fast -- they can search nearly 10 times faster than other random search methods on CIFAR-10 and ImageNet. GHNs can be further extended to the anytime prediction setting, where they have found networks with better speed-accuracy tradeoff than the state-of-the-art manual designs.

Citations (262)

Summary

  • The paper presents a novel method that encodes neural architectures as DAGs and generates weights via graph neural networks to drastically reduce NAS search time.
  • The methodology leverages GHNs to accurately predict performance, achieving competitive accuracy with only 0.42 GPU days for top models.
  • Experiments on CIFAR-10 and ImageNet demonstrate efficient scalability and practical benefits, especially in anytime prediction tasks.

An Analysis of "Graph HyperNetworks for Neural Architecture Search"

This paper investigates the use of Graph HyperNetworks (GHNs) to expedite the process of Neural Architecture Search (NAS). Traditional NAS entails a prohibitive computational cost due to the necessity of training multiple neural network architectures, each requiring significant time investment. Addressing this challenge, the authors introduce GHNs as a technique to generate network weights leveraging graph neural networks to model architectural topology. This approach offers a substantial speed advantage, achieving nearly tenfold acceleration over other random search methods on datasets such as CIFAR-10 and ImageNet.

Methodology and Contributions

The proposed GHN method involves encoding neural network architectures as directed acyclic graphs (DAGs). These graphs are then processed by a graph neural network, which generates node-specific parameters. This innovative approach contrasts existing methodologies, such as those relying on tensor encoding and string serialization. By capitalizing on graph structures, GHNs enable more accurate performance predictions for neural architectures, thereby serving as an effective surrogate search signal during NAS.

The authors validate their methodology through comprehensive experiments, demonstrating the efficacy of GHNs in both standard image classification tasks and the more complex domain of anytime prediction. The latter involves architectures with a variable computational budget, enabling models to provide predictions at any given time, a feature particularly advantageous for real-time systems.

Experimental Results

Experiments conducted on CIFAR-10 using GHNs achieved competitive results with significantly reduced search costs. The GHN approach outperformed or equaled state-of-the-art techniques in terms of accuracy while offering a much lower computational cost, equivalent to 0.42 GPU days for the top-performing models compared to the multiple GPU days required by other NAS methods. Further experimentation on ImageNet confirmed these findings, with GHNs displaying efficient scalability and adaptability to large-scale datasets.

A notable strength of the GHN approach lies in its application to the anytime prediction setting. The proposed models were shown to offer a favorable speed-accuracy tradeoff compared to manually designed architectures, marking an important advance in the field of NAS.

Theoretical and Practical Implications

The introduction of GHNs represents an enhancement in modeling architecture-level information through graph representations. This enables the creation of more accurate and efficient hypernetworks. Practically, GHNs hold the potential to significantly reduce computational resources in NAS, which is crucial for the scalability of deep learning applications.

The theoretical implications also extend to any domain where dynamic generation of model parameters from architecture representations is beneficial. GHNs provide a robust framework that can be applied to broader, more complex search spaces, potentially enabling advancements in fields reliant on dynamic model adaptation.

Future Prospects

The paper suggests several avenues for future research. Enhancing the predictive accuracy of GHN-generated weights and exploring more sophisticated graph representations stand out as promising directions. Additionally, integrating GHNs with advanced search algorithms beyond simple random search could further enhance their performance and applicability to more challenging tasks.

In summary, this paper contributes significantly to the field of NAS by introducing Graph HyperNetworks as a method to efficiently explore architectural spaces. The demonstrated empirical effectiveness combined with the theoretical insights provided underscore the potential of GHNs to transform neural architecture optimization.