Pre-NAS: Efficient Neural Architecture Search

Updated 12 September 2025

Pre-NAS is a strategy that uses prior knowledge, performance predictors, and transfer learning to greatly reduce the computational burden of neural architecture search.
It integrates methods like predictor-assisted search, differential prediction, ensemble models, and zero-shot screening to minimize full training evaluations.
Empirical benchmarks on datasets such as CIFAR-10 and ImageNet confirm that pre-NAS methods achieve competitive accuracy with orders-of-magnitude fewer GPU days.

Pre-NAS refers to a class of methods and strategies in neural architecture search (NAS) that leverage prior information, performance prediction, pre-filtering, architecture encoding, and transfer methodologies to improve the efficiency, sample efficiency, and utility of the NAS process. Unlike classical NAS, which often treats every candidate architecture as an isolated entity in a vast space that must be evaluated (partially or fully trained) de novo, pre-NAS approaches use predictors, priors, meta-learning, transfer, or search space restriction to dramatically reduce computational burden while often achieving comparable or superior results on standard benchmarks.

1. Frameworks and Methodologies

Contemporary pre-NAS algorithms are unified by their focus on mitigating the prohibitive cost associated with direct architecture evaluation in NAS. Key methodological directions include:

Predictor-Assisted Search: Methods such as PRE-NAS (Peng et al., 2022) use machine learning predictors (e.g., random forests, neural networks) to estimate architecture performance, greatly reducing the number of full trainings required. Predictors are iteratively updated with representative samples, using strategies such as percentile representative selection and multi-mutation evolutionary operations to maximize coverage of the search space. Predictor quality is critical, and predictor architectures or feature representations (matrix encodings, difference encodings, GCN-based features, etc.) must be carefully designed to capture the salient aspects of candidate networks.
Difference-Based Prediction: Delta-NAS (Sridhar et al., 21 Nov 2024) introduces a differential approach where the predictor is trained not on absolute performance, but on the difference in performance between pairs of similar architectures. If $F(\cdot)$ denotes an encoding, the predictor operates over $\Delta F = F(A) - F(B)$ to estimate the performance delta, yielding linear (rather than exponential) scaling of computational cost with respect to the search space size.
Ensemble and Bayesian Predictors: Techniques such as GP-NAS-ensemble (Chen et al., 2023) employ Gaussian processes with advanced kernels, label transformations, and ensemble stacking to improve sample efficiency and robustness of performance prediction under small-sample regimes. Ensemble schemes incorporate various encoding strategies (e.g., one-hot, two-hot), kernel weighting, and model diversity to achieve robust estimation.
Zero-Shot and Zero-Cost Screening: PreNAS (Wang et al., 2023) showcases an approach where a zero-cost predictor (e.g., SNIP, normalized SNIP) is used to pre-select a high-quality, small subspace of architectures. One-shot training is then performed only on this preferred subset, alleviating weight coupling and gradient conflict issues typically present in weight-sharing supernets.
Transfer and Meta-Learning: Transfer frameworks (such as XferNAS (Wistuba, 2019)) decompose performance predictors into universal and task-specific components via a residual structure, enabling the reuse of architectural knowledge across tasks. Meta-learning methods (see (Vo-Ho et al., 2022)) target few-shot learning and multi-task adaptation by jointly optimizing both architecture and network weights in episodic settings.
LLM Priors: GPT-NAS (Yu et al., 2023) integrates a generative pre-trained LLM (GPT) into evolutionary NAS, allowing the sequence prediction model—trained on large corpora of architecture strings—to propose promising architectural components. GPT fills in or modifies network blocks within the evolutionary process, leveraging statistical patterns found in state-of-the-art neural architectures.
Binary and Hardware-Aware NAS: Predictive and pre-NAS principles are extended to hardware-aware and quantized settings, as in NASB (Zhu et al., 2020) and BRP-NAS (Dudziak et al., 2020). These systems use GCN predictors to estimate latency on device-specific search spaces (as in LatBench) and optimize binary CNNs by balancing performance and computational footprint via sample-efficient predictor-based strategies.

2. Core Principles and Algorithmic Innovations

The following principles are central to pre-NAS methodology:

Predictor Training and Sample Efficiency: High-fidelity prediction with limited samples is enabled by advanced training set selection strategies (covering percentile bins to maximize diversity), surrogate label transformations (to impose Gaussian-like distribution for regression tasks), and modeling relative rather than absolute performance.

Surrogate Optimization: With differentiable predictors (as in PredNAS (Yuan et al., 2022)), gradient ascent can be performed directly in the architecture encoding space. Combined with projection operations onto the feasible discrete set, this results in efficient direct optimization:

$a^{(t+1)} = P_\Omega(a^{(t)} + \eta \left[\frac{\partial_m(a^{(t)})}{\partial a^{(t)}} - \alpha \frac{\partial_{\text{aux}}(a^{(t)})}{\partial a^{(t)}}\right])$

Weight Inheritance and Sharing: High-fidelity weight inheritance (as in PRE-NAS (Peng et al., 2022)) is used to carry over trained weights from parent to offspring only if they are topologically homogeneous, avoiding well-known biases associated with one-shot supernet sharing across heterogeneous subnets.

Search Space Reduction and Pruning: Methods such as PreNAS (Wang et al., 2023) and NAS-BERT (Xu et al., 2021) aggressively reduce search space cardinality via zero-cost proxies, progressive shrinking, or blockwise pruning. This facilitates tractable exploration, efficient specialization for targeted hardware, and instant adaptability to new constraints post training.

3. Empirical Performance and Benchmark Results

Pre-NAS methodologies have established strong empirical benchmarks:

PRE-NAS (Peng et al., 2022): Achieves 2.40% test error on CIFAR-10 and 24% test error on ImageNet in the DARTS search space, with search times reduced to 0.6 GPU days and no sacrifice in final accuracy compared to costlier methods.
PredNAS (Yuan et al., 2022): Reports near state-of-the-art results (e.g., improvements in CIFAR-10/100/ImageNet-16-120 with fewer than 100 evaluations), and in large-scale spaces (e.g., AnyNet, MSCOCO) matches or outperforms handcrafted and randomly sampled models, sometimes with only a few hundred candidates sampled.
Delta-NAS (Sridhar et al., 21 Nov 2024): Demonstrates significantly better performance and higher sample efficiency, achieving leading results on common NAS benchmarks, with computational complexity reduced from exponential to linear in the number of architecture transitions evaluated.
NASB (Zhu et al., 2020): For binary CNNs, gains up to 4% Top-1 accuracy on ImageNet over previous binary models, highlighting the effectiveness of pre-NAS style architectural specialization for constrained settings.
GP-NAS-ensemble (Chen et al., 2023), Predict NAS Multi-Task (Zhang, 2023): Achieve top positions in predictive NAS competitions (CVPR 2022) by combining task-specific modeling, advanced encoding, and stacking ensemble learning, with performance prediction Kendall scores exceeding 0.80.

These results underscore that pre-NAS predictors, search strategies, or pre-filtering can identify competitive architectures with orders-of-magnitude fewer full evaluations.

4. Applications and Impact

Pre-NAS frameworks are highly versatile, applicable in diverse contexts:

Transfer Learning: Reused predictors, pre-trained on multiple tasks (XferNAS (Wistuba, 2019)), reduce cold-start times for novel domains.
On-Device and Resource-Constrained Scenarios: Efficient sample use, as in BRP-NAS (Dudziak et al., 2020) or NPAS (Li et al., 2020), enables deployment where hardware or energy constraints are paramount.
Zero-Shot and Search-Free Specialization: PreNAS (Wang et al., 2023) exemplifies instant specialization post training under multiple constraints, enabling rapid adaptation to downstream targets.
Security and Intellectual Property Risks: Model reconstruction from side-channel attacks (e.g., (Hong et al., 2020)) shows pre-evaluation and predictive analysis are also of interest from a privacy and security standpoint.
Meta-Learning and Few-Shot Learning: Pre-NAS approaches, particularly when coupled with meta-learning (Vo-Ho et al., 2022), provide the means to bootstrap NAS processes for new targets or low-data regimes.

5. Limitations, Challenges, and Future Directions

Despite their advantages, pre-NAS techniques have notable limitations:

Predictor Fidelity and Generalizability: Surrogate models (ensemble, neural, or Gaussian process-based) may suffer when the search space contains high-order non-linearities or when architecture encoding fails to capture crucial design nuances.
Assumptions of Smoothness and Locality: Differential encoding as in Delta-NAS may falter if architectural changes precipitate abrupt (non-smooth) performance drops.
Transferability and Robustness: The benefit from priors or source-task training plateaus as more target data accumulate (XferNAS (Wistuba, 2019)), and robustness to search space shift remains an open question.
Encoding Design: The effectiveness of ensemble and predictor frameworks hinges on appropriate, information-rich feature encodings and kernel choices (GP-NAS-ensemble (Chen et al., 2023)).
Scaling and Complexity: While predictors offer computational savings, integrating them into evolutionary or gradient-based pipelines (as in PredNAS (Yuan et al., 2022)) requires careful management of projection operations, updating rules, and constraint satisfaction.

Future directions highlighted in the literature include hybrid predictive approaches combining absolute and differential encodings, extension to broader and more dynamic search spaces, more sophisticated or domain-specific zero-cost selectors, and deeper integration of meta-learning for multi-task and few-shot workflows.

6. Conclusion

Pre-NAS is now a central paradigm in neural architecture search, connecting prior-based reasoning, surrogate modeling, transfer learning, differential analysis, and predictive filtering. Methods such as PRE-NAS (Peng et al., 2022), Delta-NAS (Sridhar et al., 21 Nov 2024), PredNAS (Yuan et al., 2022), and PreNAS (Wang et al., 2023) collectively demonstrate that intelligently leveraging prior information, prediction, or search space filtering can cut computational cost by orders of magnitude. This has catalyzed advances in automated model design for resource-constrained devices, efficient benchmarking, transfer learning, and real-world deployment—including in security-sensitive and multi-task scenarios. Future developments are likely to focus on further increasing robustness, generality, and scalability across increasingly diverse architecture and deployment settings.