Resource-Aware Quantum Data Loading

Updated 4 July 2026

Resource-aware quantum data loading is a framework for designing state-preparation methods that explicitly optimize scarce quantum resources such as qubits, two-qubit gates, circuit depth, and ancillas using data structure and controlled approximation.
Key methodologies include tensor-network compression, adaptive circuit synthesis, and QRAM protocols that reduce gate counts and depth, effectively mitigating the exponential costs of generic amplitude encoding.
These techniques enable significant improvements in state fidelity and resource efficiency, making them critical for scalable quantum algorithms and practical implementations on current hardware.

Resource-aware quantum data loading is the design of quantum state-preparation and data-access procedures that explicitly optimize scarce quantum resources—qubits, two-qubit gates, circuit depth, ancillas, shots, and allowable approximation error—rather than assuming efficient exact preparation of arbitrary states. In the recent literature, the subject is framed as a response to the “input problem”: a generic $n$ -qubit amplitude-encoded state may require $\mathcal{O}(2^n)$ gates, $\mathcal{O}(2^n/n)$ circuit depth, or $\Theta(n)$ depth only if one is allowed $\mathcal{O}(2^n)$ ancillas, so the loading stage can dominate the total cost of a quantum algorithm (Zoufal et al., 2019, Li et al., 2023). Resource-aware methods address this bottleneck by exploiting data structure, controlled approximation, memory architecture, or alternative native resources such as shots (Iaconis et al., 2023, Kyriacou et al., 7 Apr 2026, Zhang et al., 2 Feb 2026).

1. Problem formulation and resource criteria

A common target is amplitude encoding: given a classical vector of length $M=2^n$ , prepare

$|v\rangle = \sum_{i=0}^{M-1} v_i |i\rangle,$

or, for a probability distribution $p(x)$ , prepare

$|\Psi^\star\rangle = \sum_x \sqrt{p(x)}\,|x\rangle.$

These encodings are compact in qubit count, but generic exact preparation is expensive enough to erase the advantage of downstream routines such as Quantum Amplitude Estimation, HHL, or qPCA (Jumade et al., 2023, Zoufal et al., 2019).

Within this literature, “resource-aware” means that loading is optimized under explicit constraints. Different papers treat different quantities as the main bottleneck: two-qubit gate count $G$ , depth, ancilla count, QRAM size, shot budget, trainability, or an error budget split between approximation and precision (Zhang et al., 2 Feb 2026, Chen et al., 2023, Alonso-Linaje et al., 4 Dec 2025).

Paradigm	Explicit resource knob	Representative statement
Tensor-network loaders	bond dimension $\mathcal{O}(2^n)$ 0, depth $\mathcal{O}(2^n)$ 1	exact MPS preparation uses $\mathcal{O}(2^n)$ 2 gates (Iaconis et al., 2023)
Adaptive circuit learning	operator-pool size $\mathcal{O}(2^n)$ 3, thresholds $\mathcal{O}(2^n)$ 4	the ansatz grows by appending top-gradient operators (Li et al., 2023)
QRAM protocols	QRAM levels $\mathcal{O}(2^n)$ 5, word length $\mathcal{O}(2^n)$ 6, ancillas	a parallel protocol achieves $\mathcal{O}(2^n)$ 7 time (Chen et al., 2023)
Shot-based encoding	total shots $\mathcal{O}(2^n)$ 8, state pool $\mathcal{O}(2^n)$ 9	data are encoded with zero encoding gates (Kyriacou et al., 7 Apr 2026)

A recurring distinction is between exact and approximate loading. Exact methods preserve the full target state or operator but often inherit exponential worst-case costs. Approximate quantum loaders permit an infidelity or norm error and then optimize within that budget; several papers treat this approximation not as a defect but as the central design variable (Marin-Sanchez et al., 2021, Zhang et al., 2 Feb 2026, Alonso-Linaje et al., 4 Dec 2025).

2. Compression by tensor networks and entanglement structure

Tensor-network methods are among the clearest examples of resource-aware loading because they make compression explicit. For an $\mathcal{O}(2^n/n)$ 0 grayscale image, the MPS-based image loader prepares a dense encoding on

$\mathcal{O}(2^n/n)$ 1

qubits, interprets the image as a 2-leg ladder tensor network, truncates it to bond dimension $\mathcal{O}(2^n/n)$ 2, and converts the resulting MPS to a quantum circuit (Iaconis et al., 2023). The bond dimension is the resource knob: the paper states that an MPS with bond dimension $\mathcal{O}(2^n/n)$ 3 can be exactly prepared with

$\mathcal{O}(2^n/n)$ 4

gates, with $\mathcal{O}(2^n/n)$ 5 for qubits, so both qubit count and gate count scale logarithmically in image resolution for fixed $\mathcal{O}(2^n/n)$ 6. The approximation quality is measured by the infidelity

$\mathcal{O}(2^n/n)$ 7

and empirically follows

$\mathcal{O}(2^n/n)$ 8

When the high-bond-dimension MPS is itself approximated by stacked $\mathcal{O}(2^n/n)$ 9 layers, the paper finds

$\Theta(n)$ 0

At fixed $\Theta(n)$ 1 or fixed $\Theta(n)$ 2, the infidelity saturates as image size increases, which the paper treats as evidence for resolution-independent compression (Iaconis et al., 2023).

AMLET extends the tensor-network viewpoint from images to broader scientific data. It compiles amplitude-encoding circuits by building a multi-layer tensor network, delaying truncation via index splitting, and assembling the circuit “on the fly” (Jumade et al., 2023). The method is depth-tunable: increasing the number of layers $\Theta(n)$ 3 improves fidelity, while the classical preprocessing cost scales at most as $\Theta(n)$ 4 for bounded depth because the SVDs act on matrices whose sizes shrink geometrically. Numerically, many real-world datasets required much shorter depth than generic worst-case loading: AMLET often achieved $\Theta(n)$ 5 up to 16 qubits across finance, images, fluids, and proteins; images were the easiest class to load, with virtually all visual features already present after four layers; and AMLET reduced depth by about $\Theta(n)$ 6 to $\Theta(n)$ 7 at 16 qubits relative to a simpler layer-by-layer method (Jumade et al., 2023).

AQER generalizes this structural perspective by making entanglement reduction itself the objective. It reformulates approximate quantum loaders as minimizing

$\Theta(n)$ 8

over circuit architectures and parameters, and proves that, in the small-entanglement regime, the achievable infidelity scales linearly with the total single-qubit Rényi-2 entanglement entropy $\Theta(n)$ 9 of $\mathcal{O}(2^n)$ 0 (Zhang et al., 2 Feb 2026). AQER therefore builds the circuit by iteratively appending two-qubit blocks that minimize this entanglement proxy, then adds an explicit product-state approximation and a final variational refinement. The paper reports experiments on classical image and language datasets and on quantum many-body states up to 50 qubits, with AQER consistently outperforming MPS, HEC, and AQCE in both accuracy and gate efficiency (Zhang et al., 2 Feb 2026).

3. Adaptive, generative, and function-specific synthesis

A separate line of work treats loading as an adaptive circuit-synthesis problem. ACLBM casts amplitude embedding as learning a Born machine whose output statistics match a target distribution by minimizing KL divergence (Li et al., 2023). Rather than fixing a hardware-efficient layered ansatz, it starts from an equal superposition implemented with $\mathcal{O}(2^n)$ 1 on each qubit, evaluates a self-defined operator pool, appends the top $\mathcal{O}(2^n)$ 2 operators with the largest gradient magnitudes, and reoptimizes all parameters. The operator pool includes arbitrary qubit pairs and real unitaries such as

$\mathcal{O}(2^n)$ 3

for $\mathcal{O}(2^n)$ 4. The stopping criteria are gradient-based, and the learning rate is adjusted as

$\mathcal{O}(2^n)$ 5

Empirically, ACLBM used far fewer parameters than fixed baselines on 10-qubit generic distributions, remained robust on BAS datasets up to $\mathcal{O}(2^n)$ 6, and was the only method in the paper to produce visually recognizable outputs on real $\mathcal{O}(2^n)$ 7 images flattened into 16-qubit amplitude vectors (Li et al., 2023).

The qGAN approach addresses a related problem when the target distribution is given implicitly by classical samples. Its generator is a parameterized quantum circuit with layers of single-qubit $\mathcal{O}(2^n)$ 8 rotations and $\mathcal{O}(2^n)$ 9 entangling blocks, producing

$M=2^n$ 0

while a classical discriminator is trained adversarially against measured samples (Zoufal et al., 2019). The paper frames the result as approximate loading with $M=2^n$ 1 gates instead of $M=2^n$ 2 exact state preparation. It reports successful simulation studies on log-normal, triangular, and bimodal distributions, as well as hardware experiments on IBM Q Boeblingen in which the trained distribution still approximated the target well and the Kolmogorov–Smirnov fit was accepted (Zoufal et al., 2019).

Function-specific loaders demonstrate that approximation can sometimes remove the dependence on the total qubit count altogether. The approximate Grover–Rudolph method clusters nearly equal rotation angles for smooth target functions and proves that the number of two-qubit gates can be bounded by

$M=2^n$ 3

with $M=2^n$ 4 asymptotically independent of $M=2^n$ 5 under a smoothness condition on $M=2^n$ 6 (Marin-Sanchez et al., 2021). For a normal distribution with $M=2^n$ 7 and $M=2^n$ 8, the paper reports $M=2^n$ 9, so only $|v\rangle = \sum_{i=0}^{M-1} v_i |i\rangle,$ 0 two-qubit gates are needed instead of $|v\rangle = \sum_{i=0}^{M-1} v_i |i\rangle,$ 1, while the fidelity remains $|v\rangle = \sum_{i=0}^{M-1} v_i |i\rangle,$ 2. A second contribution in the same paper is a variational ansatz tailored to zeros, singularities, and local structure, with a quasi-optimized number of hyperparameters and Grover–Rudolph initialization to improve convergence (Marin-Sanchez et al., 2021).

4. QRAM, QROM, and architecture-aware access

Memory-centric approaches shift the problem from circuit synthesis to coherent data access. In circuit-based QRAM, the basic transformation is often written as

$|v\rangle = \sum_{i=0}^{M-1} v_i |i\rangle,$ 3

The resource-aware question is then how to realize this access without assuming an infinitely large or noiseless memory (Chen et al., 2023).

One early issue is post-selection. FF-QRAM can load continuous data with cost $|v\rangle = \sum_{i=0}^{M-1} v_i |i\rangle,$ 4, where the repetition factor $|v\rangle = \sum_{i=0}^{M-1} v_i |i\rangle,$ 5 depends on the success probability of the post-selection step (Veras et al., 2020). The deterministic alternative A-PQM removes post-selection entirely while still loading continuous amplitudes, prepares

$|v\rangle = \sum_{i=0}^{M-1} v_i |i\rangle,$ 6

uses only $|v\rangle = \sum_{i=0}^{M-1} v_i |i\rangle,$ 7 qubits rather than $|v\rangle = \sum_{i=0}^{M-1} v_i |i\rangle,$ 8, and has total complexity

$|v\rangle = \sum_{i=0}^{M-1} v_i |i\rangle,$ 9

The paper’s central claim is that eliminating post-selection removes the potentially exponential overhead hidden in $p(x)$ 0 (Veras et al., 2020).

When QRAM size is fixed, the main challenge becomes how to load larger words and larger datasets without increasing the number of QRAM levels. The limited-sized QRAM protocol does this through pipelined parallelism: with word length $p(x)$ 1 and QRAM depth $p(x)$ 2, it reduces time complexity from $p(x)$ 3 to

$p(x)$ 4

and improves fidelity bounds to

$p(x)$ 5

for qutrit-based and qubit-based schemes respectively (Chen et al., 2023). The same paper extends the method to datasets larger than $p(x)$ 6 items, giving a hybrid-parallel protocol with time complexity $p(x)$ 7 and error scaling $p(x)$ 8 (Chen et al., 2023).

Architecture-aware state preparation appears again in the BBQRAM framework. By embedding a segment tree of squared norms into Bucket Brigade QRAM memory cells and retrieving sibling-node data with pipelined routing, the method prepares

$p(x)$ 9

using $|\Psi^\star\rangle = \sum_x \sqrt{p(x)}\,|x\rangle.$ 0 qubits, $|\Psi^\star\rangle = \sum_x \sqrt{p(x)}\,|x\rangle.$ 1 time, and constant ancillary qubits under a fixed-precision assumption (Berti et al., 17 Oct 2025).

A more recent QRAM architecture moves much of the complexity offline. The fast and error-correctable QRAM proposal precompiles a resource state $|\Psi^\star\rangle = \sum_x \sqrt{p(x)}\,|x\rangle.$ 2, then executes the online query using only Clifford operations, Bell measurements, and single-qubit Pauli measurements (Cesa et al., 24 Mar 2025). Its query depth is

$|\Psi^\star\rangle = \sum_x \sqrt{p(x)}\,|x\rangle.$ 3

the physical QRAM zone contains $|\Psi^\star\rangle = \sum_x \sqrt{p(x)}\,|x\rangle.$ 4 qubits, and the paper argues that the online operation is naturally compatible with fault-tolerant codes because no online non-Clifford synthesis is required (Cesa et al., 24 Mar 2025).

QROM-based work extends this memory perspective to fault-tolerant cost models. The mass-production approach shows that many parallel copies of the same data-loading oracle can be implemented with polynomially reduced total gate count in realistic cost models, while giving no asymptotic benefit if only non-Clifford gates are counted (Huggins et al., 30 May 2025). In the cited quantum-chemistry application, the scaling of a dominant parallelized QROM component improves from $|\Psi^\star\rangle = \sum_x \sqrt{p(x)}\,|x\rangle.$ 5 to $|\Psi^\star\rangle = \sum_x \sqrt{p(x)}\,|x\rangle.$ 6 (Huggins et al., 30 May 2025).

5. Alternative resource tradeoffs beyond coherent depth

Not all resource-aware loaders treat coherent gate depth as the primary currency. The divide-and-conquer algorithm explicitly exchanges time for space: instead of insisting on a pure amplitude-encoded state on $|\Psi^\star\rangle = \sum_x \sqrt{p(x)}\,|x\rangle.$ 7 qubits, it prepares

$|\Psi^\star\rangle = \sum_x \sqrt{p(x)}\,|x\rangle.$ 8

with entangled ancillas, achieving circuit depth

$|\Psi^\star\rangle = \sum_x \sqrt{p(x)}\,|x\rangle.$ 9

at the cost of

$G$ 0

qubits (Araujo et al., 2020). This is an exponential reduction in depth relative to standard $G$ 1-depth exact loading, but it leaves ancillary information entangled with the data register. The paper presents a proof of concept on ibmq_rome and reports, for $G$ 2, depth $G$ 3 and width $G$ 4, compared with Möttönen-style depth $G$ 5 and width $G$ 6 (Araujo et al., 2020).

Shot-Based Quantum Encoding treats shots themselves as the encoding medium. Instead of preparing a single data-dependent pure state, it maps an input $G$ 7 to a classical probability vector $G$ 8, allocates a total shot budget $G$ 9 according to

$\mathcal{O}(2^n)$ 00

and thereby realizes the mixed state

$\mathcal{O}(2^n)$ 01

The quantum layer is linear in the classical probabilities,

$\mathcal{O}(2^n)$ 02

so a classical nonlinearity is used between layers (Kyriacou et al., 7 Apr 2026). The key resource-aware claim is that the model uses zero encoding gates and keeps variational depth independent of the input. On Semeion, the paper reports $\mathcal{O}(2^n)$ 03 test accuracy and a $\mathcal{O}(2^n)$ 04 relative error reduction versus amplitude encoding; on Fashion-MNIST, it reports $\mathcal{O}(2^n)$ 05 test accuracy and a $\mathcal{O}(2^n)$ 06 absolute improvement over amplitude encoding (Kyriacou et al., 7 Apr 2026).

Probability-distribution loading via Feynman propagators replaces generic state preparation by structured Hamiltonian evolution. For one-dimensional Hamiltonians of the form

$\mathcal{O}(2^n)$ 07

the paper studies potentials whose propagators are known in analytically closed form and uses them to load normal, Laplace, and Maxwell-Boltzmann distributions (Alhajjar et al., 2023). The Laplace case is especially direct: for the Dirac delta potential $\mathcal{O}(2^n)$ 08, the bound-state density satisfies

$\mathcal{O}(2^n)$ 09

which the paper identifies with the Laplace PDF at $\mathcal{O}(2^n)$ 10. To manage initial-state cost, the same work introduces ladder states; a monotone ladder with $\mathcal{O}(2^n)$ 11 levels can be prepared with $\mathcal{O}(2^n)$ 12 one- and two-qubit gates and depth $\mathcal{O}(2^n)$ 13, hence $\mathcal{O}(2^n)$ 14 cost (Alhajjar et al., 2023).

6. Compilation, experiments, and interpretive issues

As the method space has broadened, automated selection has become a resource-aware problem in its own right. The compilation framework for data loading formalizes this by taking an input vector $\mathcal{O}(2^n)$ 15, a tolerance $\mathcal{O}(2^n)$ 16, and an error split

$\mathcal{O}(2^n)$ 17

then searching across multiplexer-based loaders, QROM constructions, sparse encodings, MPS, Fourier Series Loader, and Walsh transform-based diagonal operators to minimize estimated resources (Alonso-Linaje et al., 4 Dec 2025). The framework uses PennyLane resource estimation, supports both state preparation and diagonal encoding, and in a computational fluid dynamics workflow selects MPS state preparation, Walsh transform-based diagonal encoding, and Walsh-based measurement, leading to resource reductions of over four orders of magnitude compared to previous approaches (Alonso-Linaje et al., 4 Dec 2025).

Hardware demonstrations have become varied enough to reveal the practical character of the tradeoffs. The MPS image loader was executed on 8 qubits of IonQ Aria for $\mathcal{O}(2^n)$ 18 images using a depth-3 MPS circuit and $\mathcal{O}(2^n)$ 19 shots; the depth-3 circuits used only $\mathcal{O}(2^n)$ 20 CNOT gates, compared with $\mathcal{O}(2^n)$ 21 CNOTs for a naive encoding, and the authors identify this as the first large-instance full amplitude encoding of an image in a quantum state on trapped-ion hardware (Iaconis et al., 2023). The qGAN distribution loader was tested on IBM Q Boeblingen, where the trained distribution still approximated the target well under hardware noise (Zoufal et al., 2019). The divide-and-conquer loader was compiled and run on ibmq_rome, validating the low-depth, high-width tradeoff on present-day hardware (Araujo et al., 2020).

Several misconceptions recur in the literature. One is that worst-case lower bounds for arbitrary state preparation imply that practical data loading is uniformly intractable. The numerical studies on images, finance, fluids, and proteins argue otherwise: many real-world classical datasets are far easier to load than worst-case theory predicts, often because their structure can be captured by low bond dimension, short-depth tensor networks, or sparse functional descriptions (Jumade et al., 2023). Another is that logarithmic qubit scaling alone resolves the input problem. The recent literature shows that this is incomplete: even when the data register is only $\mathcal{O}(2^n)$ 22 qubits wide, the decisive bottleneck may lie in two-qubit gates, ancillas, QRAM routing, post-selection overhead, gradient cost, or shot budget (Veras et al., 2020, Cesa et al., 24 Mar 2025, Kyriacou et al., 7 Apr 2026).

A plausible implication is that resource-aware loading is no longer a single technique but a design layer spanning compression, synthesis, memory architecture, and compilation. The common pattern is not the elimination of loading cost, but its relocation into a controlled tradeoff: bond dimension versus fidelity, operator-pool size versus gradient cost, QRAM depth versus word length, width versus circuit depth, or encoding gates versus shot budget. In that sense, resource-aware quantum data loading has become the study of which physical or algorithmic resource is cheapest to spend for a given data class and target error.