NPENAS-BO: Neural Predictor-Guided NAS

Updated 12 September 2025

NPENAS-BO is a neural predictor-guided evolutionary algorithm for NAS that integrates Bayesian optimization principles with mutation-based sampling.
It employs a graph-based surrogate model using GNNs to predict both expected performance and uncertainty, effectively balancing exploration and exploitation via Thompson sampling.
Empirical evaluations show that NPENAS-BO attains competitive or superior accuracy on NAS benchmarks while significantly reducing computational cost.

NPENAS-BO is a neural predictor-guided evolutionary algorithm for neural architecture search (NAS), distinguished by its use of a graph-based surrogate model that outputs both the expected performance and uncertainty of candidate architectures. This variant leverages Bayesian optimization (BO) principles to enhance evolutionary exploration, enabling efficient search in large architecture spaces with minimal computational cost, and achieving state-of-the-art results on standard NAS benchmarks through the integration of Thompson sampling and graph neural networks.

1. Algorithmic Foundation

NPENAS-BO augments evolutionary search for NAS by incorporating a learned neural predictor acting as a BO-inspired surrogate. For each iteration, candidate architectures are produced via a one-to-many mutation scheme from a parent architecture. The surrogate (neural predictor) predicts not only the mean $\mu(s)$ of the expected performance $f(s)$ for each candidate $s$ , but also the predictive uncertainty $\sigma(s)$ , modeling $f(s) \sim \mathcal{N}(\mu(s), \sigma(s))$ .

Selection is performed using Thompson sampling: each candidate’s predicted distribution is sampled once, and the candidate with the best sampled value is selected for evaluation. This mechanism efficiently balances exploration (selecting candidates with high uncertainty) and exploitation (preferring candidates with low predicted error). Unlike GP-based BO, the neural predictor is end-to-end differentiable and computationally efficient on large search spaces, bypassing kernel computation and matrix inversion.

2. Graph-Based Neural Predictor Architecture

The surrogate model in NPENAS-BO is a graph-based uncertainty estimation network:

Input encoding: Each network architecture is represented as a directed acyclic graph (DAG), with nodes denoting layers (including “isolated” nodes for disconnected subgraphs), and the adjacency matrix encoding layer connections.
Embedding: A spatial variant of the Graph Isomorphism Network (GIN) processes the architecture’s DAG via multiple GIN layers with CELU activations, batch normalization, and global mean pooling.
Output structure: Embedded features are passed through separate fully connected branches to estimate $\mu(s)$ and $\sigma(s)$ .
Training: The predictor is trained by maximum likelihood estimation, optimizing the negative log-likelihood over observed architectures and their validation/test performances:

$w_u^* = \arg\max_{w_u} \prod_{(s_i, y_i) \in D} P(y_i | \mu_i, \sigma_i)$

where $(\mu_i, \sigma_i) = G_u(s_i)$ and $D$ is the set of labeled architectures.

Acquisition: At search time, Thompson sampling is performed by drawing a single sample $y_{TS} \sim \mathcal{N}(\mu(s), \sigma(s))$ for each candidate to rank them.

This architecture enables both fine-grained uncertainty modeling for BO-style acquisition and architectural diversity via data-driven graph representations.

3. Random Architecture Sampling Strategy

To address bias and redundancy in existing random sampling schemes, NPENAS-BO introduces a uniform search space sampling protocol:

Problem in previous samplers: Pruning of adjacency matrices generates multiple architectures with identical metrics, and common protocols over-sample architectures from a restricted subspace (“path index bias”).
Proposed method: The complete architecture space is enumerated and indexed in a dictionary, using a unique hash for each graph encoding. Sampling keys uniformly ensures unbiased and diverse sampling.
Advantages: The resulting path distribution matches the true underlying search space (verified via KL-divergence analysis). This improves training diversity for the neural predictor, which in turn enhances search performance across the full architecture space.

4. Empirical Evaluation and Metrics

NPENAS-BO achieves competitive or superior results relative to leading NAS methods, notably BANANAS and ORACLE baselines, across multiple datasets. Key outcomes:

Benchmark	Queries	Mean Test Error (%)	Baseline Comparison	GPU Days (if reported)
NASBench-101	150	≈ 5.9	BANANAS: 5.91	Not specified
NASBench-201	100	≈ 8.93	ORACLE: 8.92	Not specified
DARTS (macro)	–	Best: 2.52, Avg: 2.64	BANANAS (slower by ~4.7x)	4.7× faster than BANANAS

The metrics considered are mean test/validation error, search budget (number of architecture queries), and computational resource utilization (GPU days). These results indicate that NPENAS-BO maintains or advances state-of-the-art accuracy while reducing the computational cost of NAS by a significant margin.

5. Mathematical Formalism

The optimization problem is formalized as:

$s^* = \arg\min_{s \in S} f(s)$

where $S$ is the architecture search space, and $f(s)$ is the validation or test error for architecture $s$ .

Within the surrogate model:

$f(s) \sim \mathcal{N}(\mu(s), \sigma(s))$

The neural predictor parameters $w_u$ are optimized via MLE:

$w_u^* = \arg\max_{w_u} \prod_{(s_i, y_i) \in D} P(y_i | \mu_i, \sigma_i)$

where likelihood is computed with the Gaussian output (mean and variance determined by the neural network).

Thompson sampling selects $s$ to evaluate by drawing $y_s \sim \mathcal{N}(\mu(s), \sigma(s))$ and choosing the architecture with minimal $y_s$ .

6. Implications and Applications

The NPENAS-BO framework has broad implications for scalable NAS and beyond:

Efficient NAS: The reduction in full network training/evaluation requirements translates directly to lower search costs and wall-clock time in practical NAS pipelines.
Generalizability: Although validated on image classification tasks, such as CIFAR-10, the approach is extensible to semantic segmentation, object detection, and sequence-based tasks like neural machine translation.
Principled exploration: Explicit uncertainty modeling in the context of architectural graphs enables a theoretically grounded trade-off between exploration and exploitation in vast discrete spaces.
Integration: The neural predictor’s GNN-based architecture is amenable to end-to-end training and does not impose the matrix inversion or kernel selection burdens typical of GP surrogates.

NPENAS-BO serves as a template for future neural predictor-guided optimization in NAS and potentially for broader discrete or combinatorial optimization scenarios.

7. Broader Research Context and Future Directions

NPENAS-BO represents an advance at the interface of evolutionary search and Bayesian optimization using deep graph-based surrogates. Its design addresses gaps in prior methods – particularly in uncertainty estimation, sampling bias, and computational tractability over large spaces. A plausible implication is the emergence of hybrid neural surrogate BO paradigms, extending these principles to more complex architectures and tasks, possibly incorporating transfer learning and multi-task search.

Future research can expand upon:

Extension to variable-length, macro/micro-level architecture hierarchies.
Transferability of predictors across different NAS search spaces.
Further reduction of search costs by integrating zero-cost proxies or meta-learned surrogates.
Enhancements to the neural predictor (deeper or attention-based GNNs) and acquisition strategy (beyond Thompson sampling).

NPENAS-BO provides a scalable, flexible framework for NAS that is both empirically strong and conceptually extensible.

PDF Markdown Chat (Pro)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to NPENAS-BO.