Iterative Hardware-Aware NAS

Updated 26 December 2025

Iterative hardware-aware NAS comprises methods that iteratively refine neural architectures via surrogate modeling and evolutionary strategies under strict hardware constraints like latency, energy, and memory.
It employs iterative search workflows that alternate between candidate generation and hardware-in-the-loop evaluation, achieving significant cost reductions of up to 8×.
It integrates surrogate models, reinforcement learning, and LLM-driven proposals to balance multi-objective trade-offs and achieve Pareto-optimal solutions on various platforms.

Iterative hardware-aware neural architecture search (HW-NAS) comprises a set of neuroevolutionary, surrogate-driven, and hardware-in-the-loop optimization methodologies that explicitly and iteratively refine neural network architectures for task performance under explicit hardware constraints such as latency, energy, memory, and processing element utilization. These frameworks aim to solve the multi-objective challenge of maximizing accuracy or task score while respecting resource ceilings or Pareto-optimizing trade-offs on real-world deployment hardware. Contemporary approaches span diverse modalities—including vision, language, graph, and spike-based computation—with state-of-the-art results on edge devices, FPGAs, CPUs, GPUs, and novel accelerators. This article surveys foundational principles, algorithmic mechanisms, predictor and cost modeling strategies, and recent advances in iterative, hardware-aware NAS.

1. Problem Formulation and Multi-Objective Optimization

Hardware-aware NAS seeks architectures that optimize predictive performance while obeying device-specific constraints. The general multi-objective formulation is: $\min_{a\in\mathcal S} [\,f_\mathrm{acc}(a),\ f_\mathrm{lat}(a, h),\ f_\mathrm{energy}(a, h), \ldots\,] \quad \text{subject to} \quad f_\mathrm{lat}(a, h)\leq \mathrm{Lat}_{\max},\ f_\mathrm{mem}(a, h)\leq \mathrm{Mem}_{\max}, \ldots$ where $a$ is an architecture from search space $\mathcal S$ , $h$ denotes hardware configuration, and $f_\mathrm{acc}$ , $f_\mathrm{lat}$ , $f_\mathrm{energy}$ , $f_\mathrm{mem}$ etc. denote accuracy, latency, energy, memory usage. Solutions are sought on the Pareto front: $\mathcal{P}^* = \{ a \in \mathcal S \mid \nexists\,b\in \mathcal S: F_i(b)\le F_i(a)\ \forall i\wedge F_j(b)<F_j(a)\ \text{for some }j\}$ with $F(a, h)$ the vector of objectives (Cummings et al., 2022, Tran et al., 23 Dec 2025, Bouzidi et al., 20 Feb 2024). The constraint specification can be either hard (strict feasibility) or soft (penalty terms), and is often driven by the primary deployment bottleneck (e.g., $P_\mathrm{latency}\le8$ ms for mobile).

2. Iterative Search Workflows

A central pattern across high-performing frameworks is an iterative loop that alternates between candidate generation and evaluation—incorporating hardware proxies and true measurements in the loop. Typical workflow structure:

Population Initialization: Sample a population of candidate architectures (random or diversity-driven).
Surrogate Model Bootstrapping: Measure key objectives (accuracy, latency, energy) exactly on a small subset; fit lightweight predictors/surrogates $p_\mathrm{acc}$ , $p_\mathrm{lat}$ , $p_\mathrm{en}$ (Cummings et al., 2022, Tran et al., 23 Dec 2025).
Evolutionary/Gradient-Based Search: Use evolutionary algorithms (NSGA-II, RL, or LLM-driven operators) to propose new candidates, scored using surrogate-predicted objectives.
True Evaluation and Model Update: Select the most promising or uncertain candidates, measure them on real hardware, expand the training set, and update surrogates.
Pareto Front Maintenance: Use non-dominated sorting and diversity criteria (crowding distance, hypervolume) to maintain a well-spread solution set.

Representative algorithmic pseudocode for such a loop is presented in (Cummings et al., 2022, Bouzidi et al., 20 Feb 2024, Tran et al., 23 Dec 2025), and (Robben et al., 11 Dec 2025). This interleaving of cheap surrogate-guided search with expensive hardware-in-the-loop validation achieves 2–8× reduction in search cost versus traditional decoupled approaches.

3. Surrogate Modeling and Hardware Prediction

Surrogates, trained iteratively during search, provide rapid predictions of task accuracy, latency, energy, or other objectives:

Neural predictors: Shallow MLPs, XGBoost regressors, or even GNN-based models for GNN architecture latency (Zhou et al., 2023, Cummings et al., 2022, Bouzidi et al., 20 Feb 2024, Robben et al., 11 Dec 2025).
Zero-cost proxies: Linear representation similarity (RMI), SNIP, SynFlow computed on a batch or via ablation (Sinha et al., 2023, Zhu et al., 1 Oct 2025).
Analytical models: For accelerators, analytic latency/area/energy models fitted via a handful of real measurements are used; FLASH achieves microsecond prediction across $10^{10}$ + search points (Li et al., 2021).
Uncertainty ensembles: Ensembles of surrogates with uncertainty gating for on-demand validation (Cummings et al., 2022).
Hardware profiling or lookup: For common devices (Jetson, FPGA), hardware cost is measured or looked up from precomputed tables (e.g., HW-NAS-Bench (Zhu et al., 1 Oct 2025, Sinha et al., 2023)).

These surrogates reduce search time by orders of magnitude while retaining ranking fidelity (typical RMSE <2–4%, rank correlation τ>0.8).

4. Evolutionary, RL, and LLM-Driven Optimization

Population-based search algorithms dominate iterative HW-NAS:

Evolutionary Algorithms (EA): NSGA-II with domain-adapted mutation/crossover operators, often guided by on-line feature importance (as in SONATA’s adaptive operator selection with tree-based surrogates and RL policy (Bouzidi et al., 20 Feb 2024)). EA steps may alternate between backbone and head, or function-then-operation (HGNAS, TrashDet) (Tran et al., 23 Dec 2025, Zhou et al., 2023).
Reinforcement Learning (RL): Sample architecture and/or hardware parameters via a controller (LSTM/transformer-based policy) with policy-gradient and scalarized reward (accuracy–hardware trade-off) (Lu et al., 2019, Akhauri et al., 2021).
LLM-Driven Evolution: LLMs propose, rationalize, and diversify candidate architectures, guided by evolving knowledge bases and constraint niching (PEL-NAS) (Zhu et al., 1 Oct 2025).
Specialized Optimization: Analytical global optimization (e.g., hierarchical SHGO in FLASH), integer programming for accelerator mapping, or greedy DP for tile-mapping (Li et al., 2021, Lu et al., 2019).

Recent frameworks integrate on-the-fly operator adaptation, knowledge-driven mutation, and constraint-aware variation for rapid convergence under hard deployment constraints.

5. Hardware-Aware Cost Modeling and Constraint Enforcement

Hardware-aware NAS depends on accurate, efficient, and device-specific modeling of resource consumption:

Latency and energy models: Device-specific, using either regression, look-up from HW-NAS-Bench, or cycle-accurate simulation (e.g., RHNAS (Akhauri et al., 2021)).
Area and memory: For FPGAs and in-memory computing, area/BRAM/LUT is estimated analytically or via synthesis tools (hls4ml, FINN) (Weitz et al., 9 Jan 2025, Ji et al., 4 Mar 2024).
Multi-objective constraint handling: Hard-pruning (rejecting candidates exceeding budget during mutation or selection) (Robben et al., 11 Dec 2025, Ji et al., 4 Mar 2024), or soft penalty with linear (or customized) penalty terms in the scalarized reward (Sinha et al., 2023).

Constraint handling ensures that discovered architectures are not only optimal in solution space but are also directly deployable, achieving significant reductions in energy, latency, parameter count, or other metrics across applications.

6. Search Space Engineering and Diversity Maintenance

Search spaces are constructed modularly—with elastic parameters controlling depth, width, kernel size, operation type, quantization, or even hardware design parameters (Ji et al., 4 Mar 2024, Zhu et al., 1 Oct 2025, Akhauri et al., 2021). Key space diversification strategies include:

Partitioned optimization: Alternating between search in backbone/head, function/operation, or complexity-based niches (to avoid mode collapse) (Tran et al., 23 Dec 2025, Zhu et al., 1 Oct 2025, Zhou et al., 2023).
Knowledge-directed co-evolution: Coupling adaptive knowledge bases with operator selection or LLM-guided proposal (Zhu et al., 1 Oct 2025, Bouzidi et al., 20 Feb 2024).
Diversity promotion: NSGA-II's crowding distance, hypervolume maximization, hypervolume-constrained evolutionary optimization (MOEA/D), or clustering-assisted reduction of unpromising search subspaces (PopDB, (Sarah et al., 2022)).

Such strategies are critical for ensuring wide Pareto front coverage, specialized device adaptation, and resilience against search space entrapment or premature convergence.

7. Empirical Performance and Practical Recommendations

Iterative hardware-aware NAS frameworks have achieved state-of-the-art hardware-aware trade-offs:

Up to 60–135% speedup over prior DNNs on FPGA (HAO, TrashDet (Dong et al., 2021, Tran et al., 23 Dec 2025)).
Pareto-dominance rates up to 93.6% over vanilla EAs (SONATA (Bouzidi et al., 20 Feb 2024)).
Search cost reduced from days to minutes (FLASH, PEL-NAS (Li et al., 2021, Zhu et al., 1 Oct 2025)).
10.6×–88.2% memory and latency reduction with negligible test accuracy loss for GNNs and vision nets on edge devices (Zhou et al., 2023, Robben et al., 11 Dec 2025).
Large reduction in carbon emissions and total search FLOPs (Sinha et al., 2023).

Best practice guidelines include: bootstrapping predictors on diverse hardware-specific samples, uncertainty-based true-evaluation triggers, diversity-preserving selection (NSGA-II or MOEA/D), and modular search space partitioning for high-dimensional or multi-modal tasks—facilitating robust, efficient, and scalable hardware-aware NAS.

Key References: