Auto-Keras: An Efficient Neural Architecture Search System (1806.10282v3)

Published 27 Jun 2018 in cs.LG, cs.AI, and stat.ML

Abstract: Neural architecture search (NAS) has been proposed to automatically tune deep neural networks, but existing search algorithms, e.g., NASNet, PNAS, usually suffer from expensive computational cost. Network morphism, which keeps the functionality of a neural network while changing its neural architecture, could be helpful for NAS by enabling more efficient training during the search. In this paper, we propose a novel framework enabling Bayesian optimization to guide the network morphism for efficient neural architecture search. The framework develops a neural network kernel and a tree-structured acquisition function optimization algorithm to efficiently explores the search space. Intensive experiments on real-world benchmark datasets have been done to demonstrate the superior performance of the developed framework over the state-of-the-art methods. Moreover, we build an open-source AutoML system based on our method, namely Auto-Keras. The system runs in parallel on CPU and GPU, with an adaptive search strategy for different GPU memory limits.

Authors (3)

Haifeng Jin (6 papers)
Qingquan Song (25 papers)
Xia Hu (186 papers)

Citations (749)

View on Semantic Scholar

Summary

The paper introduces an NAS method that integrates network morphism with Bayesian optimization to reduce training time and computational cost.
It develops an edit-distance based kernel that enables Gaussian Process optimization over variable-length, tree-structured neural architectures.
Auto-Keras, the resulting open-source system, demonstrates superior search efficiency and lower error rates on benchmarks like MNIST and CIFAR10.

The paper presents an efficient neural architecture search (NAS) method that combines network morphism with Bayesian optimization (BO) to accelerate the search for high-performing deep neural network architectures. The proposed method is deployed in an open-source AutoML system called Auto-Keras. Below is a detailed summary of its key components and practical contributions.

Problem Setting and Motivation

The goal is to automatically search for the optimal neural network architecture f* within a flexible search space 𝓕. In contrast to traditional NAS methods that require training each model from scratch, the paper leverages network morphism to transform an existing network into a new one with only slight retraining. This helps reduce the average training time (t̄) during the search process.
The overall optimization problem is split into two levels: one identifying the optimal architecture and the other optimizing parameters for a given network. The emphasis is on reducing the number of expensive network evaluations (n) during search.

Network Morphism and Bayesian Optimization Integration

Network Morphism: Instead of training a network from scratch after modifying its architecture, network morphism operations (e.g., inserting layers, widening layers, and adding skip connections) preserve the network’s functionality while evolving its structure. The paper extends layer-level morphism to a graph-level view in order to maintain consistency of intermediate tensor shapes across the network. This graph-level framework ensures that changes (for example, due to added pooling or processing layers) do not break tensor dimension compatibility.
Bayesian Optimization Guidance: The search is driven by Bayesian optimization, which is used to decide the most promising network morphism operations to perform at each step. In traditional BO, the Gaussian process (GP) would require a fixed-length vector representation; however, due to the variable and non-Euclidean nature of neural architectures, the authors design an edit-distance–based kernel function.

Edit-Distance Neural Network Kernel

The authors propose a kernel defined as κ(fₐ, f_b) = exp(–ρ²(d(fₐ, f_b))), where d(fₐ, f_b) measures the number of edit operations needed to morph one network into another.
The edit-distance d has two parts:
- Dₗ for the layers, computed using a dynamic programming approach that aligns layers in topologically sorted order. The per-layer distance dₗ compares the “widths” of layers.
- Dₛ for skip connections, estimated via a bipartite matching formulation solved with the Hungarian algorithm.
To ensure the distance is isometrically embeddable in Euclidean space (a requirement for GP kernels), the authors use a distortion mapping (via Bourgain’s theorem) to convert the approximated edit-distance into an embedding that guarantees positive definiteness.

Acquisition Function Optimization in a Tree-Structured Space

The search space is formulated as a tree where each node is a neural architecture and edges represent morphism operations.
The acquisition function used is an upper-confidence bound (UCB) defined as α(f) = μ(y_f) – βσ(y_f), where μ and σ are the posterior mean and variance from the GP model.
Traditional BO methods cannot directly optimize this function over a discrete, tree-structured space. To overcome this, the paper proposes a strategy that combines A* search (which typically expands the best candidate) with simulated annealing (to balance exploration and exploitation). This hybrid method generates promising architecture modifications efficiently.

Graph-Level Network Morphism

The authors extend layer-level changes to a full graph-level morphism. For example, when widening a layer, all affected regions of the network (both the preceding and subsequent layers) must be updated to maintain tensor shape consistency.
For adding skip connections (either additive or concatenative), additional pooling layers or post-processing layers may be inserted so that the tensor dimensions remain consistent.

System Implementation: Auto-Keras

Auto-Keras is an open-source AutoML system built upon the proposed NAS method. The design focuses on a user-friendly, reproducible API similar to that of Scikit-Learn and Keras.
Practical Features Include:
- A two-level API: a high-level “task-level” interface for end users and a lower-level “search-level” interface for expert configuration.
- Parallelism between CPU and GPU processing: while the GPU is busy training a candidate network, the CPU simultaneously generates new architectures using the Bayesian optimizer.
- GPU memory adaptation: an estimation function limits the network size based on available GPU memory, and dynamic adjustments are made when memory constraints are encountered.
- A persistent storage mechanism that saves trained architectures so that searches can be restored if interrupted.

Experimental Evaluation and Efficiency

Extensive experiments are conducted on benchmark datasets (MNIST, CIFAR10, and Fashion-MNIST). The proposed method (AK) delivers the lowest error rates compared with state-of-the-art methods and traditional hyperparameter tuning approaches (e.g., SMAC, grid search, random search).
Two variants are evaluated in ablation studies:
- A breadth-first search (BFS) variant that uses network morphism without Bayesian optimization, and
- A pure BO variant that does not utilize network morphism.
- The combination of both (AK) achieves a superior trade-off between search efficiency (in terms of the number of architectures evaluated) and training speed.
Detailed parameter sensitivity analysis shows that balancing the two kernel components (layers versus skip-connections via λ) and setting the exploration balance parameter (β) are critical for performance.

Theoretical Contributions

The paper rigorously proves that the defined edit-distance d(fₐ, f_b) is a valid metric and that the resulting kernel κ is positive definite after applying a distortion with Bourgain’s theorem.
These theoretical results justify the use of the kernel function in a Gaussian process for Bayesian optimization.

Conclusion

The paper introduces a novel framework that efficiently searches neural architectures by guiding network morphism with Bayesian optimization. The development of an edit-distance neural network kernel, a specialized search strategy for tree-structured spaces, and graph-level morphism operations collectively contribute to significant computational savings and improved performance. The open-source Auto-Keras system makes these advances accessible to practitioners by offering an easy-to-use interface, robust parallelism on CPU and GPU, and adaptive memory management strategies.

This work is of practical importance because it addresses the high computational cost associated with NAS, provides a clear path for integrating BO with network morphism, and delivers a deployable AutoML system that can be run locally without the need for complex infrastructure.

PDF Markdown

Related Papers

YouTube

Show All Videos