Hardware-Efficient Ansatz (HEA)

Updated 9 November 2025

HEA is a parameterized quantum circuit architecture that exploits native hardware gate sets and connectivity for efficient variational state preparation.
Recent physics-constrained designs like XYZ2F address key issues by ensuring universality, systematic improvability, and size-consistency while minimizing circuit depth.
Optimized HEAs balance expressibility, trainability, and hardware adaptivity through careful parameter initialization, local interactions, and tailored gate sequences.

A hardware-efficient ansatz (HEA) is a parameterized quantum circuit architecture engineered to exploit the native gate set and topology of a target quantum hardware platform, enabling variational state preparation with minimal circuit depth and overhead. Contrary to chemically or physically motivated ansätze such as unitary coupled cluster with singles and doubles (UCCSD), hardware-efficient ansätze are constructed from alternating layers of easily implemented single-qubit rotations and entangling gates dictated by device connectivity. Despite their advantageous hardware compatibility and resource efficiency, conventional HEAs have historically been chosen by heuristic or adaptive rules, leading to concerns about trainability, lack of universality, expressibility, and finite-size scaling. Recent research has provided both rigorous theoretical frameworks and a suite of enhanced hardware-efficient ansätze designed to address these open issues.

1. Definition and Fundamental Properties

HEA circuits generally take the form: $|\Psi(\boldsymbol{\theta})\rangle = U_L(\boldsymbol{\theta}_L)\cdots U_2(\boldsymbol{\theta}_2)U_1(\boldsymbol{\theta}_1)|\Phi_0\rangle,$ with each $U_\ell(\boldsymbol{\theta}_\ell)$ being a hardware-native block composed of single- and two-qubit gates (e.g., $R_y$ , $R_z$ , CNOT, CZ) that respect the hardware's connectivity graph. The number of variational parameters scales as $\sim N \times L$ for $N$ qubits and $L$ layers, though variants employing more general rotations or block structures can increase this count.

Previous HEA designs (e.g., “Ry linear,” “RyRz full,” ASWAP, cascade, Hamiltonian-variational) mainly emphasized operational convenience and empirical performance but lacked guarantees for three crucial constraints:

Universality: the ability to approximate any state arbitrarily well with sufficient depth;
Systematic improvability: monotonic improvement in accuracy by adding layers;
Size-consistency: fidelity to tensor-product separability across noninteracting subsystems.

Absent these constraints, heuristic HEAs may exhibit non-monotonic convergence, loss of previously optimized accuracy upon further circuit expansion, and severe scalability bottlenecks, especially when extending to more than $\sim$ 10 qubits (Xiao et al., 2023). Furthermore, deep random circuits constructed from these primitives are susceptible to "barren plateaus," where trainable gradients vanish exponentially with system size (Nakaji et al., 2020, Leone et al., 2022).

2. Physics-Constrained HEA: XYZ2F Construction

Xiao et al. (Xiao et al., 2023) introduced a "physics-constrained" HEA (notably XYZ2F) that is universal, systematically improvable, and size-consistent while utilizing only linear (nearest-neighbor) connectivity. The design principles and structure are:

Single-qubit layer: $U_1(\alpha, \beta) = R_x(\alpha)R_y(\beta)$ , acting individually on all qubits.
Two-qubit layer: $U_2(\theta, \phi) = [I\otimes R_y(\phi/2)]\,U_\text{fSim}(\theta, \phi)\,[I\otimes R_y(-\phi/2)]$ , with

$U_\text{fSim}(\theta, \phi) = \mathrm{diag}(1, \cos\theta, \cos\theta, e^{-i\phi}) + i \sin\theta (|01\rangle\langle10| + |10\rangle\langle01|),$

enabling the realization of $I$ , CNOT, and iSWAP via parameter selection.

Layer construction: Each “XYZ1F” layer is a staircase of single-qubit and two-qubit blocks spanning the chain, with $2(N-1)$ fSim gates and $3N$ single-qubit rotations.
Size-consistency and noninteracting limit: An additional central $R_z$ -rotation layer ensures that product states and separable subsystem states are preserved at $L=1$ and higher layers.

The unitary per layer can be written as

$U_\ell = \left[\bigotimes_{k=0}^{N-1} R_x(\alpha_{\ell,k}) R_y(\beta_{\ell,k})\right] \left[\bigotimes_{k=0}^{N-2} U_2(\theta_{\ell,k}, \phi_{\ell,k})\right] [\otimes R_z (\cdots) ] \left[\bigotimes_{k=0}^{N-1} R_x(\alpha'_{\ell,k}) R_y(\beta'_{\ell,k})\right].$

A minimal assignment of parameters and nearest-neighbor connectivity suffices for universality: for any $\epsilon > 0$ and state $|\Psi_\text{target}\rangle$ , some $L$ and parameters exist with $\|\Psi(\boldsymbol{\theta})-\Psi_\text{target}\| < \epsilon$ . Systematic improvability is enforced via the ability to set the final layer to identity, so that $V_A^L \subset V_A^{L+1}$ in variational subspace, and size-consistency is maintained for all $L$ by construction.

A table summarizing resource scaling of typical ansätze:

Ansatz	# Parameters / Layer	2Q Gate Count / Layer	Depth per Layer
Ry linear	$N(L+1)$	$(N-1)L$	$N+3L-2$
ASWAP	$2(N-1)L$	$(N-1)L$	$2L$
XYZ2F	$(5N-2)L$	$2(N-1)L$	$(4N+3)L$

Only XYZ2F satisfies all three theoretical constraints (Xiao et al., 2023).

3. Expressibility, Trainability, and Initialization

Expressibility is evaluated via frame potential and fidelity-based metrics, e.g.,

$F^{(t)}(C) = \int_{\theta,\phi} |\langle\psi_\phi|\psi_\theta\rangle|^{2t} \, d\theta\,d\phi,$

with smaller $F^{(t)}$ indicating higher expressibility (closer to Haar measure) (Nakaji et al., 2020). Shallow alternating-layered ansätze (ALT) with block size $m = O(\log n)$ and $\ell = O(\operatorname{poly}\log n)$ layers achieve near-Haar expressibility while avoiding barren plateaus; the variance of gradients scales as $O(2^{-m\ell})$ rather than $O(2^{-n})$ for global HEA, preserving trainability when $m$ is logarithmic in $n$ .

Initialization protocols have a critical impact on scalable training. By setting all HEA parameters in the first block within $O(1/(pN))$ (“small-parameter regime”) or initializing X-rotation angles in a regime below the many-body localized (MBL) transition, the circuit avoids exponential gradient decay at any depth, with $|\partial_{i,j} C| = \Omega(1)$ (Park et al., 7 Mar 2024). Random initialization without such constraints leads to barren plateaus unless the observable is highly nonlocal.

Area-law entangled data, as opposed to volume-law typical states, support efficient training with shallow circuits, as gradients do not vanish and the majority of nontrivial loss function values anti-concentrate (Leone et al., 2022). In machine learning and VQE, starting from product or weakly entangled input states is favorable for HEA optimization.

4. Resource Scaling, Hardware Adaptivity, and Variants

HEAs are tailored for maximal hardware compatibility:

Gate locality: All two-qubit gates are nearest-neighbor or exploit hardware-native interactions.
Parameter scaling: Parameter count scales linearly with both $N$ and $L$ for standard designs; in convolutional or symmetry-adapted variants (e.g., SnCQA (Zheng et al., 2022)), the use of permutation or lattice symmetries can reduce the number of independent parameters significantly.
Depth vs width: Introducing ancilla qubits (Hardware-Adaptable Ansatz, HAA (Zeng et al., 2023)) enables a trade-off between circuit depth ( $L$ ) and width ( $n$ ). By employing $n$ ancillas and cross-coupling all system qubits to ancillas in each layer, rapid entanglement is established, allowing for shallow-depth realization of highly expressive states with practical two-qubit gate counts even for $>$ 20 atoms.
Entanglement connection architecture: Restricting inter-block entanglement to a minimal number of cross-partition gates, as in Single Entanglement Connection Architecture (SECA (Zhang et al., 2023)), retains near-optimal expressibility while improving trainability and minimizing distributed quantum computation overhead.
Evolutionary growth: Algorithmic discovery of hardware-efficient variational circuits, e.g., by evolutionary programming as in EVQE (Rattew et al., 2019), can further suppress circuit depth and two-qubit gate count for specific Hamiltonians, leveraging device constraints and hardware-aware fitness penalties.

Hardware-specific ansätze, such as HEA-TI for trapped ion systems (Zhuang et al., 3 Jul 2024), exploit all-to-all native spin-spin interactions and global entanglers, reducing or eliminating the necessity for individually addressed two-qubit gates. Resource scaling is then $O(NL)$ for both single-qubit gates and global pulses, with circuit depths of $O(L)$ and practical gate counts for $N$ up to 12–20.

5. Universality and Complexity-Theoretic Status

Recent formal analysis has established that classes of hardware-efficient ansätze (e.g., Ry-Rz-CZ, Ry-CNOT on a linear chain) are strictly/computationally universal, i.e., any polynomial-size quantum circuit can be exactly compiled into a polynomial-depth HEA of that class with at most $n+O(1)$ (for Ry-Rz-CZ) or $n+4$ (for Ry-CNOT) qubits (Iwakiri et al., 5 Nov 2025). Specifically, the following hold:

All single-qubit Clifford+ $T$ gates ( $H,T$ ) map to a depth-0 Ry-Rz single-qubit layer.
Nearest-neighbor CZ and CNOT gates can be synthesized in $O(N)$ or $O(N^2)$ depth using only the HEA's primitive gates and SWAPs.
Simulating arbitrary polynomial-depth HEA circuits is BQP-complete, confirming that they fully capture quantum computational power in the circuit model.

Universality does not guarantee practical trainability: expressible but deep random HEA circuits face barren plateaus, and meaningful acceleration is only anticipated in the Goldilocks regime (intermediate depth, area-law data) (Leone et al., 2022).

6. Performance Benchmarks and Empirical Results

XYZ2F demonstrates strict size-consistency, monotonic energy convergence with layer number, and scalability to 12+ qubit spin and fermionic chains, outperforming heuristic designs and achieving chemical accuracy with substantially fewer layers. For example:

In the 1D Heisenberg chain, XYZ2F achieves $1$ mH accuracy with parameter and two-qubit gate counts scaling as $N^{1.98}$ and $N^{2.17}$ , respectively.
On minimal-basis LiH, H $_2$ O, and N $_2$ (12 qubits), XYZ2F outpaces Ry-based or ASWAP ansätze, retaining accuracy as $N$ grows.
HAA achieves similar chemical accuracy for a 21-atom cycloreversion reaction with $<250$ two-qubit gates, compared to $>400$ for HEA and thousands for UCCSD.
SECA suppresses barren plateaus and improves convergence compared to full-entanglement counterparts, with less than 20% loss in entanglement metrics and significant reduction in variational overhead (Zhang et al., 2023).
In trapped-ion platforms, HEA-TI enables efficient cluster-state and molecular ground-state preparation with $N\times L$ scaling, matching or exceeding UCCSD accuracy with simpler operation primitives (Zhuang et al., 3 Jul 2024).

7. Implementation Challenges and Practical Recommendations

While hardware-efficient ansätze enable practical deployment on available NISQ hardware, implementation tradeoffs remain:

Large circuit depth per layer (as in XYZ2F and other physics-constrained ansätze) increases decoherence and noise sensitivity, especially on superconducting devices with limited coherence times.
Shallow circuits or the use of block-local or symmetry-protected architectures can control circuit depth, maintain trainability, and mitigate barren plateaus, but may reduce expressiveness unless block sizes grow slowly with $N$ .
Device-specific or evolutionary designs (e.g., EVQE) can optimize gate placement, respecting connectivity and error rates, but require additional classical overhead and careful fitness tuning.
Smart parameter initialization is essential to preserve gradient magnitudes and avoid barren plateaus in deep circuits; initialization in the perturbative or MBL regimes has been rigorously justified (Park et al., 7 Mar 2024).
For distributed quantum computation, limiting cross-partition entanglers minimizes gate-cutting overhead, enabling scaling to larger problem sizes with minimal classical simulation cost (Zhang et al., 2023).

Conclusion

The hardware-efficient ansatz and its modern, physics-constrained and adaptive extensions provide a scalable, resource-optimized framework for variational quantum algorithms on NISQ hardware. By enforcing universality, systematic improvability, and size-consistency, and through careful architectural and initialization choices, HEA variants achieve state-of-the-art performance on both synthetic and real quantum tasks, while mapping efficiently onto physical device topologies. Implementation must balance expressibility, parameter count, circuit depth, and optimization robustness, with specific strategies tailored to application domain, device architecture, and problem data entanglement structure. Theoretical results confirm that the HEA encapsulates the full power of quantum computation, but practical deployment demands careful constraint of depth, parameterization, and initialization to preserve trainability and hardware feasibility (Xiao et al., 2023, Iwakiri et al., 5 Nov 2025, Leone et al., 2022, Park et al., 7 Mar 2024).