Neural Architecture Generator Optimization

Updated 25 February 2026

Neural Architecture Generator Optimization (NAGO) is a framework that reconceptualizes neural architecture search by directly generating network designs and hyperparameters under multiple constraints.
It employs diverse optimization strategies, including evolutionary algorithms, generative models, and diffusion processes to balance metrics such as accuracy, latency, and efficiency.
NAGO integrates surrogate predictors and multi-objective guidance to drastically reduce search time while achieving superior performance across varied benchmarks.

Neural Architecture Generator Optimization (NAGO) refers to a broad class of methodologies that reframe neural architecture search (NAS) as the direct synthesis or conditional generation of optimal networks, often together with their hyperparameters, for a given set of tasks, constraints, or multi-objective criteria. NAGO systems encode architectures as either flexible graph-structured objects or sequences, employ diverse generator parameterizations (e.g., evolutionary algorithms, learned controllers, generative models, diffusion processes), and typically integrate surrogate or predictor networks for task- and constraint-aware evaluation. This paradigm supports amortized or one-shot generation of architectures optimizing accuracy, efficiency, cost, and generalization across diverse search spaces and datasets.

1. Search Spaces and Architecture Representations

NAGO frameworks operate over highly expressive search spaces, extending beyond chain- or layer-wise topologies. A central representation is the directed acyclic graph (DAG), in which:

Node set $V = \{v_1,\dots,v_m\}$ encodes computation units (layers), each parameterized by a triple $\lambda_i = (\text{comb}_i, \text{op}_i, \text{act}_i)$ : aggregation scheme ( $\in$ {“Add”, “Multiply”, “Concat”}), functional operator (spanning CNN, RNN, MLP, or self-attention layer types), and activation function.
Edge set $E\subseteq\{(v_i,v_j)\mid i<j\}$ defines data flow. Storage as an upper-triangular adjacency matrix ensures acyclicity.
Complete architecture encoding: $(G, \Lambda)$ with $G\in\mathcal{A}$ (the set of admissible DAGs) and $\Lambda$ the collection of nodewise operator assignments and hyperparameters (Keisler et al., 2023).

Specialized generators may output continuous relaxations (e.g., soft assignment logits for operations per block (Huang et al., 2021)) or autoregressive sequences for cell-based designs (Guo et al., 2022, Guo et al., 2021). Some methods further embed full architectures jointly with weights into continuous spaces via encoder-decoder autoencoders (Huang et al., 2024).

2. Generator Optimization Algorithms

NAGO encompasses a spectrum of generator-optimization strategies, including:

Evolutionary Search and Graph Operators

Classical asynchronous evolutionary algorithms operate directly on the DAG+hyperparameter encoding, applying:

Graph-editing operators (add node, delete node, add edge, delete edge, change operator)
Node-wise hyperparameter mutation (random categorical redrawing, local integer perturbation, or real-valued noise)
Crossover via connected subgraph exchange

Selection is typically via tournament on validation loss, with replacement of worst individuals, and can jointly optimize for regularized metrics such as mean absolute scaled error (MASE) plus computational cost (Keisler et al., 2023).

Generative Model-Based Approaches

Generative NAGO systems parameterize architecture distributions as generators $G_\theta(c)$ , conditioned on task or constraint $c$ :

Conditional LSTM controllers emit auto-regressive tokens (e.g., for block choices or cell wiring), with learned budget embeddings interpolated for arbitrary user-input constraints (Guo et al., 2021, Guo et al., 2022).
ConvNet or MLP generators output soft architecture logits given constraint and random seed inputs, with a differentiable Gumbel-Softmax parameterization (Huang et al., 2021).
Learned flow, VAE, or adversarial generators embed architectures in latent spaces for backpropagation-based shaping or adversarial reward maximization (Rezaei et al., 2021, Lukasik et al., 2022).

Generator optimization is performed using gradient ascent (policy gradient, adversarial RL, or joint surrogate loss), evolutionary strategies (see Section 4), or Bayesian optimization over low-dimensional generator hyperparameters (Ru et al., 2020).

Diffusion Processes and Evolution-Guided Denoising

Recent NAGO work leverages diffusion models to invert noisy graph encodings into valid architectures:

Score-based graph diffusions train noise-predictor networks for denoising; conditional sampling implements predictor gradient guidance to satisfy constraints (e.g., accuracy, latency) (An et al., 2023, Lomurno et al., 2024).
Evolution-meets-diffusion: fitness-driven evolutionary algorithms replace learned denoisers, combining elitism, diversity preservation, and roulette selection on predicted utility at each reverse diffusion step, with denoising guided by weighted averaging on the high-fitness population (Zhou et al., 24 Apr 2025).

3. Surrogate Predictors, Constraints, and Multiobjective Guidance

Nearly all NAGO methods employ learned surrogate models for task- or constraint-conditioned evaluation of proposed architectures:

Performance predictors (e.g., multi-target MLPs, Set-Transformer/GNN hybrids, or regression heads conditioned on task embeddings) are trained to approximate accuracy, parameter count, MACs, latency, or generalization on noisy and denoised architectures (Lomurno et al., 2024, An et al., 2023, Zhou et al., 24 Apr 2025).
Pareto guidance: Rankings based on dominance in multi-objective space ( $\text{accuracy}\uparrow$ , $\text{cost}\downarrow$ , $\text{latency}\downarrow$ ) define reward signals or loss terms (Guo et al., 2022, Guo et al., 2021).
Conditional generation: Generator outputs are optionally filtered post hoc using predictor-based Pareto front identification, or generator losses are regularized by budget violation or soft penalty terms, e.g., $L_{\text{val}} + \lambda L_C$ (Huang et al., 2021).

Predictor models ingest low-shot task or dataset embeddings (e.g., via Vision Transformers over small support sets), enabling transferable and amortized search across tasks (Lomurno et al., 2024).

4. Metaheuristics, Conditioning, and Sample Efficiency

NAGO admits a diversity of metaheuristics:

Method/Variant	Conditioning Input	Main Generator Type
Steady-state evolutionary EA (Keisler et al., 2023)	None / Dataset	Direct graph + HP encoding
Conditional LSTM policy (Guo et al., 2021, Guo et al., 2022)	Budget vector/embedding	LSTM controller
Generator+surrogate joint training (Huang et al., 2021, Lukasik et al., 2022)	Constraint scalar + seed	ConvNet, GNN-decoder
Diffusion process (An et al., 2023, Zhou et al., 24 Apr 2025, Lomurno et al., 2024)	Task, accuracy, multi-objective	Score-based/fitness denoising
Bayesian optimization over generator (Ru et al., 2020)	–	Hierarchical generator
Simultaneous param/arch gradient (Huang et al., 2024)	Dataset	Embedding autoencoder

Conditioning on constraints (e.g., FLOPs, latency, number of parameters) is enforced by generator design (input budget tokens), surrogate reward penalties, or many-objective predictor-guided diffusion and post-generation Pareto filtering.

Sample efficiency is achieved by amortized training (one generator for all constraints), low-dimensional hyperparameterization, or meta-learning across tasks, with experimental results demonstrating order-of-magnitude reductions in search cost and inference latency versus repeated search baselines (Huang et al., 2021, An et al., 2023, Zhou et al., 24 Apr 2025, Guo et al., 2022).

5. Empirical Results and Benchmarks

NAGO methods are validated on extensive benchmarks, including the 27-dataset Monash time-series archive (Keisler et al., 2023), NAS-Bench-101/201/301, MobileNetV3 subnetwork search (Guo et al., 2021, Lomurno et al., 2024), and hardware-conditional settings.

Key findings:

First 1--2 hours typically suffice to find high-quality architectures in large multi-task archives; generator-based NAGO outperforms random or simulated annealing-based baselines (Keisler et al., 2023).
Order-of-magnitude reductions in wall clock time for generating architectures at arbitrary budgets; e.g., 5 GPU-hrs for all $N$ constraints with SGNAS (vs. $24N$ GPU-hrs for GreedyNAS) (Huang et al., 2021).
Superiority in top-1 accuracy and/or Pareto front coverage across diverse latency, parameter, and MAC constraints on mobile, CPU, and GPU platforms (Guo et al., 2021, Guo et al., 2022, Lomurno et al., 2024).
Evolutionary-diffusion NAGO achieves up to $100\times$ speedup in inference, with state-of-the-art test accuracy and strictly dominant Pareto fronts on multi-objective evaluations (Zhou et al., 24 Apr 2025, Lomurno et al., 2024).
Meta-learning/transfer approaches further lower search time to near zero when prior knowledge is available and provide mechanisms for early stopping during candidate evaluation (Wistuba et al., 2019).

6. Limitations, Extensions, and Open Challenges

Several limitations and ongoing research directions are characterized:

Predictor bias: Suboptimal or miscalibrated performance predictors can skew generator optimization, particularly in many-objective or transfer settings (Lomurno et al., 2024, Zhou et al., 24 Apr 2025).
Scalability: Expansion to very deep architectures or extremely large search spaces challenges continuous encoding methods (e.g., autoencoder embedding capacity) (Huang et al., 2024, Lukasik et al., 2022).
Conditional validity: Generation validity rates may decrease with highly irregular macro-architectures, requiring stricter decoding or post-filtering (Lomurno et al., 2024).
Objectives: Most current frameworks focus on accuracy, latency, memory, and MACs; further generalization to other objectives such as energy, robustness, and fairness is plausible (Lomurno et al., 2024).
Local optimization traps: Greedy growth--pruning schemes can get trapped in suboptimal motifs; hybridization with global search strategies is an open area (Khashin et al., 2021).
Hardware-awareness: Latency prediction remains challenging, especially for diverse deployment targets, motivating richer hardware-specific modeling (Lomurno et al., 2024).

NAGO synthesis now constitutes a foundational paradigm in neural architecture search, enabling meta-learned, task-adaptive, and constraint-aware conditional neural network generation with strong empirical results and extensibility across domains (Keisler et al., 2023, Huang et al., 2021, Lomurno et al., 2024, Zhou et al., 24 Apr 2025).