Neural Architecture Search with Bayesian Optimisation and Optimal Transport (1802.07191v3)

Published 11 Feb 2018 in cs.LG and stat.ML

Abstract: Bayesian Optimisation (BO) refers to a class of methods for global optimisation of a function $f$ which is only accessible via point evaluations. It is typically used in settings where $f$ is expensive to evaluate. A common use case for BO in machine learning is model selection, where it is not possible to analytically model the generalisation performance of a statistical model, and we resort to noisy and expensive training and validation procedures to choose the best model. Conventional BO methods have focused on Euclidean and categorical domains, which, in the context of model selection, only permits tuning scalar hyper-parameters of machine learning algorithms. However, with the surge of interest in deep learning, there is an increasing demand to tune neural network \emph{architectures}. In this work, we develop NASBOT, a Gaussian process based BO framework for neural architecture search. To accomplish this, we develop a distance metric in the space of neural network architectures which can be computed efficiently via an optimal transport program. This distance might be of independent interest to the deep learning community as it may find applications outside of BO. We demonstrate that NASBOT outperforms other alternatives for architecture search in several cross validation based model selection tasks on multi-layer perceptrons and convolutional neural networks.

Citations (569)

View on Semantic Scholar

Summary

The paper introduces NNBO, a novel BO framework that uses an optimal transport-based metric (NNdists) to quantify similarities between neural network architectures.
It leverages an evolutionary algorithm to navigate the combinatorial search space, identifying architectures with lower MSE and improved classification performance.
Empirical results on datasets like CIFAR10 validate NNBO's ability to discover unique network designs, offering a scalable solution for complex architecture searches.

Neural Architecture Search with Bayesian Optimisation and Optimal Transport

The paper, authored by Kirthevasan Kandasamy and colleagues, addresses the pivotal challenge of optimizing neural network architectures using Bayesian Optimization (BO) integrated with Optimal Transport (OT) methods. The research primarily introduces \nnbo, a BO framework tailored specifically for neural architecture search, which aims to efficiently navigate the vast space of neural network designs to identify architectures with superior performance.

Background and Motivation

Bayesian Optimization is a well-regarded approach for optimizing expensive objective functions by constructing a surrogate model, typically a Gaussian Process (GP), to predict the utility of untested points in the search space. Traditional BO methods, however, face limitations when applied to neural architectures due to the challenge of quantifying similarities across different network structures and efficiently exploring a combinatorial domain.

Core Contributions

\nndists Metric: A significant contribution of the paper is the introduction of a pseudo-distance called \nndists, which quantifies the dissimilarity between two neural network architectures. This distance is computed efficiently using an OT framework, which aligns computational units across networks while minimizing penalties associated with mismatches in layer types and structural differences.
Implementation of \nnbo: The researchers propose a robust BO framework exploiting the \nndists metric as a kernel within a GP model. The approach benefits from an evolutionary algorithm (EA) to optimize the acquisition function over network architectures, balancing exploration and exploitation in selecting promising candidates for evaluation.
Empirical Validation: The paper validates \nnbo across various datasets, including MLP and CNN tasks, showcasing its superiority over random search, standard evolutionary algorithms, and existing BO methods confined to simpler, feedforward structures. The results underscore \nnbo's efficacy in efficiently discovering high-performing neural architectures under computational constraints.

Numerical Results and Claims

The research demonstrates that \nnbo outperforms its counterparts on multi-task benchmarks, achieving lower mean squared errors and classification errors across datasets such as Cifar10 and protein structure prediction. The framework also consistently discovers networks with unique architectures featuring long skip connections and multiple decision layers, indicative of its capability to explore complex design spaces effectively.

Implications and Future Directions

The integration of OT with BO presents notable practical implications by scaling neural architecture searches to accommodate arbitrary network structures, beyond traditional feedforward limitations. The paper suggests that \nndists may possess applications outside BO, such as in evaluating neural network topologies in other machine learning contexts.

Future research could delve into expanding the scalability of \nnbo for extremely large model spaces or integrating additional hyper-parameter tuning within its framework. The promising results of the \nndists metric as a robust similarity measure might also inspire further exploration into its potential in various graph-structured data settings.

In summary, this work provides a methodological advancement in neural architecture search, reinforcing the utility of Bayesian frameworks in the ever-growing landscape of artificial intelligence and neural network design.

PDF Markdown

Related Papers

GitHub

GitHub - kirthevasank/nasbot: Neural Architecture Search with Bayesian Optimisation and Optimal Transport (134 stars)