Architectural Simplification in NAS

Updated 6 February 2026

Architectural simplification is a strategy to reduce neural network redundancy by iteratively pruning, factorizing, and compacting search spaces for efficient design.
Neural Architecture Search automates network design through algorithmic exploration, employing techniques like zero-shot pruning and gradient-based optimization.
Hierarchical and progressive reduction methods accelerate search efficiency, lower computational demands, and maintain high accuracy in model development.

Architectural simplification and neural architecture search (NAS) are intimately linked threads in the evolution of deep learning systems. Architectural simplification aims to reduce the size, redundancy, or computational cost of neural networks while preserving—or ideally enhancing—predictive accuracy. Neural architecture search formalizes the automated discovery of network topologies via algorithmic or probabilistic exploration of vast design spaces. Recent research has sharply focused on efficient search space design, iterative pruning and shrinkage, topological factorization for gradient-based search, and multi-level hierarchical frameworks, all contributing to a principle-guided reduction of search and model complexity without sacrificing accuracy or diversity.

1. Formulating Architectural Simplification in NAS

Architectural simplification within NAS is motivated by the challenge of traversing exponentially large design spaces containing functionally redundant or weakly performing architectures. The objective is to produce a simplified search space—or a more compact final model—that has much lower cardinality than the original, while concentrating the density of high-performing configurations and preserving architectural diversity.

Formally, for a large NAS search space $\mathcal{S}$ , one seeks a subspace $\mathcal{S}'\subset\mathcal{S}$ such that $|\mathcal{S}'|\ll|\mathcal{S}|$ and $P_{\mathcal{S}'}=|\mathcal{S}'\cap\mathcal{S}_+|/|\mathcal{S}'|$ is maximized, where $\mathcal{S}_+$ is the (unknown) set of high-accuracy architectures. This is operationalized in recent work as an optimization over subspaces, with objectives such as maximizing $E_{n\sim \mathrm{Uniform}(\mathcal{S}')}[\mathrm{Acc}(n)]$ for a search budget $K$ (Gopal et al., 2023).

Simplification targets both the search process and the resulting architectures. On the search side, this involves designing compact, interpretable, or hierarchically structured spaces (Ru et al., 2020, Siddiqui et al., 5 Feb 2025, Tripathi et al., 20 Aug 2025); pruning or masking weak candidates via zero-shot or iterative local analysis (Gopal et al., 2023, Rumiantsev et al., 19 May 2025); and continuous reduction during differentiable search (Noy et al., 2019, Zhao et al., 2024). On the architecture side, simplification encompasses function-preserving network morphisms (Elsken et al., 2017), adaptive operation downgrades (Guo et al., 2019), or direct complexity–error constrained search (Khashin et al., 2021).

2. Locality, Topology, and Pruning: Modern Algorithms

Recent methods exploit empirical observations about the structure–performance landscape of NAS spaces. LISSNAS (Gopal et al., 2023) quantifies "locality"—the tendency for structurally similar networks (measured by edit distance in the DAG cell space) to possess similar accuracy—using autocorrelation and average absolute accuracy difference (AAD) statistics. This informs an iterative shrinkage procedure: select promising seeds via proxy predictors, expand local pockets of neighbors, and repeatedly contract the space, automatically preserving multiple diverse clusters of high-yield architectures.

Zero-shot pruning (e.g., based on the NNGP kernel norm (Rumiantsev et al., 19 May 2025)) is another powerful simplification lever. By rapidly ranking architectures using analytic or statistical surrogates—without any weight updates—one can eliminate large volumes of noncompetitive candidates before invoking high-cost one-shot NAS methods. This reduces memory and search time by up to 81% in the DARTS cell space, with negligible loss (and sometimes a marginal gain) in final test accuracy.

TopoNAS (Zhao et al., 2024) addresses the specific challenge of topological redundancy in one-shot, gradient-based spaces. Many operations in DARTS-like setups cannot be linearly merged due to nonaligned nonlinear wrappers, impeding straightforward kernel reparameterization. TopoNAS applies recursive module-sharing transformations—partial and floating module sharing—factoring out isomorphic subgraphs in the operation-level DAG, and compresses many parallel paths into shared computational units. Kernel normalization is then enforced to break scaling invariances, restoring meaningful gradients to architectural parameters. This yields 15–20% memory savings, double-digit reductions in search time, and stable/increased test accuracy.

3. Hierarchical and Macro–Micro Space Redesign

Global simplification is realized by hierarchical structuring or macro–micro decomposition of the search space. In neural architecture generator optimization (Ru et al., 2020), the authors introduce a generator-based paradigm: a small-dimensional vector $\Theta$ parametrizes a hierarchy of random graphs (Watts–Strogatz and Erdős–Rényi) across stage, cell, and operation levels, yielding a distribution over families of architectures rather than single instances. This shifts optimization from high-dimensional, brittle search over networks to continuous optimization in a more interpretable and tractable parameter space, enabling direct application of Bayesian multi-fidelity and multi-objective optimization methods. This generator-based simplification not only reduces search times but allows transfer across tasks and systematic exploration of macro-structural motifs.

Efficient global NAS (Siddiqui et al., 5 Feb 2025) explicitly decouples macro-structure (depth and width) from micro-choices (per-layer operation type and kernel size). Instead of exhaustively searching all $O(|O|\times|K|)^D$ network configurations, the approach incrementally grows and prunes depth/width, then performs parameter-balanced micro-modifications (e.g., switching convolution type, varying kernel) while holding size constant. Dynamic, architecture-aware training budgets compensate for variable convergence rates of different models. This methodology achieves 2–4 $\times$ lower search time than prior global methods and produces highly competitive models (e.g., 3.17% error at 2.49M parameters on CIFAR-10) (Siddiqui et al., 5 Feb 2025).

Hierarchical hybrid NAS with adaptive mutation (HHNAS-AM) (Tripathi et al., 20 Aug 2025) formalizes a two-level structure: a macro layer of domain-informed architecture templates (e.g., different RoBERTa-plus-CNN branches for text) and a micro layer of template-specific hyperparameters. By pruning the macro layer using domain knowledge and employing Q-learning–driven adaptive mutation probabilities in the micro layer, redundancy is reduced and exploration is focused on impactful design variables. This design converges to high-performing solutions with roughly half the macro search cost of flat alternatives and delivers up to 8% improvement in target accuracy.

4. Progressive and Direct Architectural Reduction Techniques

Iterative and local architectural simplification is widely adopted as a direct strategy. Greedy pruning and growing, as in (Khashin et al., 2021), alternate between connection removal (preserving loss up to a small $\epsilon$ margin) and neuron insertion, unconstrained by layer structure. Complexity $C(w)$ is directly minimized under an error constraint, yielding empirical Pareto frontiers between error and size and discovering highly compact, nonlayered architectures. The networks produced are often strictly smaller than any grid-searched layered design for equal accuracy.

Grow-and-prune pipelines (e.g., STEERAGE (Hassantabar et al., 2019)) couple global NAS via cheap boosted decision-tree surrogates and quasi-Monte Carlo sampling with local refinement: new features (hidden connections, neurons, or filters) are grown via gradient saliency, uninformative weights are pruned by magnitude or sensitivity, and neuron structure is simplified by partial linearization. This two-step process can deliver up to 8.6 $\times$ parameter reduction over standard LeNet-5 with improved accuracy on MNIST (0.66% error vs. 0.80%), and significant error / cost gains over ResNet-18/101 on CIFAR-10, outperforming conventional deepening/widening approaches (Hassantabar et al., 2019).

The Neural Architecture Transformer (NAT) (Guo et al., 2019) applies a Markov decision process to modify edge-level operations (e.g., replacing convolutions by skips or nulls) under a fixed budget, using RL policy learning to maximize accuracy without increasing computational cost. On both hand-crafted and NAS-derived architectures, NAT optimally replaces redundant operations, yielding architectures with up to 20% fewer multiply-adds and higher accuracy.

5. Differentiable and Annealed Continuous Pruning

Differentiable NAS methods often suffer from a "relaxation gap"—architectures may appear promising in the continuous softmax-relaxed setting but degrade when discretized. ASAP (Noy et al., 2019) addresses this by interleaving annealed temperature schedules (shrinking softmax temperatures over time) with progressive pruning: as the temperature decays, the softmax probabilities concentrate, and low-probability operations are pruned as soon as they fall below a threshold (e.g., $p_o<0.4$ ). This ensures a monotonic decline in computational cost and complexity during search, minimizes the gap between continuous/discrete performance, avoids abrupt graph perturbations, and results in final networks that match or surpass the accuracy of more computationally expensive competitors.

Auto-DeepLab (Liu et al., 2019) leverages a two-level (cell and network) hierarchical search space, each relaxed via softmax weighting and optimized in a bi-level manner, capturing both micro (block operator, input) and macro (resolution path) design. The joint continuous relaxation and efficient decoding via Viterbi extraction enable the design of highly competitive segmentation models in $3$ GPU-days, a dramatic cost reduction versus hand-searched or single-level NAS.

6. Preserving Architectural Diversity and Search Utility

A major concern in simplification is loss of diversity, which can limit the expressive potential and robustness of the discovered solution set. Methods such as LISSNAS (Gopal et al., 2023) explicitly retain multiple pockets of high-performing architectures during iterative shrinkage by extracting structurally disjoint seed clusters and expanding localized neighbors. Evaluation criteria include Kendall’s Tau for predictor–ground-truth performance ranking, architectural diversity computed as maximum pairwise cosine distance in feature-encoded architecture vectors, and coverage of FLOPs/parameter histograms across the shrunk and full space. Empirically, LISSNAS preserves nearly all the original diversity (e.g., $D_{\mathrm{cos}}=255.6$ vs. $260$ on NASBench101) while achieving top accuracies and up to $10^{11}\times$ search-space reduction.

In GNN spaces, the SNAG framework (Zhao et al., 2020) imposes a minimal but fully expressive subspace—limited to node and layer aggregators plus skip flags—thereby reducing unnecessary branching without biasing toward a specific model class. Despite this, the search is able to rediscover or outperform GraphNAS/Auto-GNN and even human-designed architectures, suggesting that simplification centered on the most crucial axes of variation retains search efficacy and sample efficiency.

7. Comparative Performance and Limitations

Architectural simplification, as realized in recent literature, yields substantial empirical gains in both accuracy–complexity trade-off and search efficiency. On image classification, methods such as LISSNAS (Gopal et al., 2023) achieve SOTA Top-1 ImageNet mobile constraint accuracy ( $77.6\%$ ), with strong architectural diversity and size compression; TopoNAS (Zhao et al., 2024) and zero-shot pruning (Rumiantsev et al., 19 May 2025) offer double-digit speedups or memory reductions without loss of performance; global–local pipelines like STEERAGE (Hassantabar et al., 2019) and efficient global NAS (Siddiqui et al., 5 Feb 2025) outperform deeper/wider hand-designed baselines at a fraction of the parameter cost. Annealing/pruning methods surpass traditional DARTS while converging in 0.2 GPU-days (Noy et al., 2019).

There remain notable limitations: many methods rely on the existence of strong surrogates or predictors, and the optimal criteria and parameters for pruning, seed/neighbor counts, or annealing schedules are context-specific and must be tuned. Simplifications based on edit distance or random walks presume local correlation between structure and accuracy, which may not hold in radically unstructured spaces. In frameworks that decouple macro/micro search, transferability across domains is an active question. Finally, purely local or greedy methods (grow/prune, hill climb) can fail to escape local optima, motivating ongoing work in global–local hybrids and probabilistic exploration.

In summary, architectural simplification in NAS is now supported by a portfolio of theoretically principled and empirically validated techniques—including locality-based iterative shrinkage, analytic proxy-based pruning, topological factorization, hierarchical decomposition, adaptive mutation, and anneal–prune dynamics. Collectively, these reduce search costs and model size, preserve or enhance accuracy and diversity, and provide reproducible, quantitative guidelines for large-scale, resource-constrained, or domain-specific NAS deployments (Gopal et al., 2023, Guo et al., 2019, Ru et al., 2020, Khashin et al., 2021, Rumiantsev et al., 19 May 2025, Zhao et al., 2024, Siddiqui et al., 5 Feb 2025, Hassantabar et al., 2019, Tripathi et al., 20 Aug 2025, Noy et al., 2019, Liu et al., 2019, Zhao et al., 2020, Elsken et al., 2017, Kim et al., 2022).