Papers
Topics
Authors
Recent
2000 character limit reached

Iterative Hardware-Aware NAS

Updated 26 December 2025
  • Iterative hardware-aware NAS comprises methods that iteratively refine neural architectures via surrogate modeling and evolutionary strategies under strict hardware constraints like latency, energy, and memory.
  • It employs iterative search workflows that alternate between candidate generation and hardware-in-the-loop evaluation, achieving significant cost reductions of up to 8×.
  • It integrates surrogate models, reinforcement learning, and LLM-driven proposals to balance multi-objective trade-offs and achieve Pareto-optimal solutions on various platforms.

Iterative hardware-aware neural architecture search (HW-NAS) comprises a set of neuroevolutionary, surrogate-driven, and hardware-in-the-loop optimization methodologies that explicitly and iteratively refine neural network architectures for task performance under explicit hardware constraints such as latency, energy, memory, and processing element utilization. These frameworks aim to solve the multi-objective challenge of maximizing accuracy or task score while respecting resource ceilings or Pareto-optimizing trade-offs on real-world deployment hardware. Contemporary approaches span diverse modalities—including vision, language, graph, and spike-based computation—with state-of-the-art results on edge devices, FPGAs, CPUs, GPUs, and novel accelerators. This article surveys foundational principles, algorithmic mechanisms, predictor and cost modeling strategies, and recent advances in iterative, hardware-aware NAS.

1. Problem Formulation and Multi-Objective Optimization

Hardware-aware NAS seeks architectures that optimize predictive performance while obeying device-specific constraints. The general multi-objective formulation is: minaS[facc(a), flat(a,h), fenergy(a,h),]subject toflat(a,h)Latmax, fmem(a,h)Memmax,\min_{a\in\mathcal S} [\,f_\mathrm{acc}(a),\ f_\mathrm{lat}(a, h),\ f_\mathrm{energy}(a, h), \ldots\,] \quad \text{subject to} \quad f_\mathrm{lat}(a, h)\leq \mathrm{Lat}_{\max},\ f_\mathrm{mem}(a, h)\leq \mathrm{Mem}_{\max}, \ldots where aa is an architecture from search space S\mathcal S, hh denotes hardware configuration, and faccf_\mathrm{acc}, flatf_\mathrm{lat}, fenergyf_\mathrm{energy}, fmemf_\mathrm{mem} etc. denote accuracy, latency, energy, memory usage. Solutions are sought on the Pareto front: P={aSbS:Fi(b)Fi(a) iFj(b)<Fj(a) for some j}\mathcal{P}^* = \{ a \in \mathcal S \mid \nexists\,b\in \mathcal S: F_i(b)\le F_i(a)\ \forall i\wedge F_j(b)<F_j(a)\ \text{for some }j\} with F(a,h)F(a, h) the vector of objectives (Cummings et al., 2022, Tran et al., 23 Dec 2025, Bouzidi et al., 20 Feb 2024). The constraint specification can be either hard (strict feasibility) or soft (penalty terms), and is often driven by the primary deployment bottleneck (e.g., Platency8P_\mathrm{latency}\le8 ms for mobile).

2. Iterative Search Workflows

A central pattern across high-performing frameworks is an iterative loop that alternates between candidate generation and evaluation—incorporating hardware proxies and true measurements in the loop. Typical workflow structure:

  1. Population Initialization: Sample a population of candidate architectures (random or diversity-driven).
  2. Surrogate Model Bootstrapping: Measure key objectives (accuracy, latency, energy) exactly on a small subset; fit lightweight predictors/surrogates paccp_\mathrm{acc}, platp_\mathrm{lat}, penp_\mathrm{en} (Cummings et al., 2022, Tran et al., 23 Dec 2025).
  3. Evolutionary/Gradient-Based Search: Use evolutionary algorithms (NSGA-II, RL, or LLM-driven operators) to propose new candidates, scored using surrogate-predicted objectives.
  4. True Evaluation and Model Update: Select the most promising or uncertain candidates, measure them on real hardware, expand the training set, and update surrogates.
  5. Pareto Front Maintenance: Use non-dominated sorting and diversity criteria (crowding distance, hypervolume) to maintain a well-spread solution set.

Representative algorithmic pseudocode for such a loop is presented in (Cummings et al., 2022, Bouzidi et al., 20 Feb 2024, Tran et al., 23 Dec 2025), and (Robben et al., 11 Dec 2025). This interleaving of cheap surrogate-guided search with expensive hardware-in-the-loop validation achieves 2–8× reduction in search cost versus traditional decoupled approaches.

3. Surrogate Modeling and Hardware Prediction

Surrogates, trained iteratively during search, provide rapid predictions of task accuracy, latency, energy, or other objectives:

These surrogates reduce search time by orders of magnitude while retaining ranking fidelity (typical RMSE <2–4%, rank correlation τ>0.8).

4. Evolutionary, RL, and LLM-Driven Optimization

Population-based search algorithms dominate iterative HW-NAS:

  • Evolutionary Algorithms (EA): NSGA-II with domain-adapted mutation/crossover operators, often guided by on-line feature importance (as in SONATA’s adaptive operator selection with tree-based surrogates and RL policy (Bouzidi et al., 20 Feb 2024)). EA steps may alternate between backbone and head, or function-then-operation (HGNAS, TrashDet) (Tran et al., 23 Dec 2025, Zhou et al., 2023).
  • Reinforcement Learning (RL): Sample architecture and/or hardware parameters via a controller (LSTM/transformer-based policy) with policy-gradient and scalarized reward (accuracy–hardware trade-off) (Lu et al., 2019, Akhauri et al., 2021).
  • LLM-Driven Evolution: LLMs propose, rationalize, and diversify candidate architectures, guided by evolving knowledge bases and constraint niching (PEL-NAS) (Zhu et al., 1 Oct 2025).
  • Specialized Optimization: Analytical global optimization (e.g., hierarchical SHGO in FLASH), integer programming for accelerator mapping, or greedy DP for tile-mapping (Li et al., 2021, Lu et al., 2019).

Recent frameworks integrate on-the-fly operator adaptation, knowledge-driven mutation, and constraint-aware variation for rapid convergence under hard deployment constraints.

5. Hardware-Aware Cost Modeling and Constraint Enforcement

Hardware-aware NAS depends on accurate, efficient, and device-specific modeling of resource consumption:

Constraint handling ensures that discovered architectures are not only optimal in solution space but are also directly deployable, achieving significant reductions in energy, latency, parameter count, or other metrics across applications.

6. Search Space Engineering and Diversity Maintenance

Search spaces are constructed modularly—with elastic parameters controlling depth, width, kernel size, operation type, quantization, or even hardware design parameters (Ji et al., 4 Mar 2024, Zhu et al., 1 Oct 2025, Akhauri et al., 2021). Key space diversification strategies include:

Such strategies are critical for ensuring wide Pareto front coverage, specialized device adaptation, and resilience against search space entrapment or premature convergence.

7. Empirical Performance and Practical Recommendations

Iterative hardware-aware NAS frameworks have achieved state-of-the-art hardware-aware trade-offs:

Best practice guidelines include: bootstrapping predictors on diverse hardware-specific samples, uncertainty-based true-evaluation triggers, diversity-preserving selection (NSGA-II or MOEA/D), and modular search space partitioning for high-dimensional or multi-modal tasks—facilitating robust, efficient, and scalable hardware-aware NAS.


Key References:

Definition Search Book Streamline Icon: https://streamlinehq.com
References (16)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Iterative Hardware-Aware Neural Architecture Search.