Quantum Volume (QV): Benchmarking NISQ Devices

Updated 23 April 2026

Quantum Volume (QV) is a holistic metric that quantifies NISQ device capabilities by measuring the largest size of random, square circuits with heavy-output probabilities exceeding a 2/3 threshold.
The methodology involves executing random circuits with equal width and depth and aggregating heavy-output probabilities across many trials to evaluate device performance.
Optimizations in compiler techniques and error mitigation can significantly boost effective QV, making it a key performance indicator for both hardware and software advancements.

Quantum Volume (QV) is a system-level, single-number metric quantifying the balanced interplay of width, depth, fidelity, connectivity, and compilation quality for a noisy intermediate-scale quantum (NISQ) device. QV captures the largest size of a random, square quantum circuit (equal width and depth) that a device can execute such that the measured heavy-output probability (HOP) exceeds a fixed threshold, typically 2/3. Critically, QV is designed to be architecture-agnostic and holistic, folding hardware, compiler, and device variability into a single operational benchmark (Pelofske et al., 2022).

1. Formal Definition and Benchmarking Protocol

Formally, a device’s quantum volume is

$\mathrm{QV}=2^d$

where $d$ is the largest integer for which the device can reliably run random $m$ -qubit, depth- $m$ circuits (with $m=d$ ) and achieve mean heavy-output probability (HOP) above threshold, with high statistical confidence. The “heavy set” for each random circuit $U$ is defined as

$H_U=\{x: p_U(x)>p_\text{median}\}$

where $p_U(x)=|\langle x|U|0\rangle|^2$ and $p_\text{median}$ is the median of all outcome probabilities. The heavy-output probability for each random instance is the fraction of measurement outcomes in the heavy set. The QV pass criterion requires that over $k$ circuits (commonly $d$ 0), the mean HOP $d$ 1, with the lower $d$ 2 bound and the $d$ 3 confidence lower bound also exceeding $d$ 4 (Pelofske et al., 2022, Wack et al., 2021).

Circuit construction proceeds layer-wise: every layer consists of a random permutation of all $d$ 5 qubits and a set of disjoint two-qubit SU(4) random unitaries. If $d$ 6 is odd, one qubit is idle each layer. The device "passes" for width $d$ 7 if the statistical pass criteria above are satisfied (Cross et al., 2018, Mori et al., 2022).

2. Holistic System Characterization

Quantum Volume is expressly designed to be a holistic system benchmark (Amico et al., 2023, Cross et al., 2018, Jurcevic et al., 2020). It probes the interplay and limitations arising from:

Gate fidelities: Imperfect single- and two-qubit gate errors accumulate with increasing depth, reducing HOP at larger circuit sizes.
Qubit connectivity: Non-all-to-all connectivity increases SWAP overhead during random permutations, compounding error and reducing effective maximum $d$ 8.
Coherence time: Maximum executable depth at a given width is ultimately coherence limited.
Compiler optimization: Improved gate decompositions, noise-adaptive routing, and advanced transpiler routines can compress circuit depth and decrease cumulative error.
Variability: QV is sensitive to gate calibration drifts, cross-talk, and device-to-device as well as subset-to-subset performance variations (Pelofske et al., 2022).

This comprehensive scope is achieved by essentially requiring the device to successfully execute deep, wide, entangling, and randomly structured circuits—which collectively simulate realistic complex workloads (in a worst-case sense)—at scale.

3. Statistical and Practical Considerations

The QV test is operationalized as follows:

For each width $d$ 9, generate $m$ 0 random depth- $m$ 1 circuits.
For each circuit, obtain HOP by comparing measured bitstrings to the classically precomputed heavy set.
Aggregate results over all circuits and test if mean HOP and confidence intervals clear the $m$ 2 threshold (Baldwin et al., 2021).

The ideal mean HOP is $m$ 3 for large $m$ 4 (Porter-Thomas limit). In the fully depolarized limit, HOP falls to 1/2. The threshold $m$ 5 is chosen to robustly exclude trivial classical sampling (Baldwin et al., 2021). For physically feasible QV tests (up to $m$ 6 classically), the test is computationally bounded by classical simulation for heavy set identification; beyond this, scalable mirror-circuit or parity-constrained variants are required (Bistroń et al., 4 Feb 2025, Amico et al., 2023).

Typical QV experiments on major platforms include IBM Q, IonQ, Rigetti, OQC, Quantinuum, with measured values (2022) ranging from $m$ 7 (OQC Lucy) to $m$ 8 (Quantinuum H1-2). Device-to-device and subset-to-subset variability is significant—only specific qubit subsets satisfy the QV pass for a given $m$ 9, and the same subset may drift in/out of pass region over time (Pelofske et al., 2022).

4. Compilation, Transpilation, and Error Mitigation Impact

Compilation and classical pre-processing are major QV determinants. Enhanced transpilation (qubit subset enumeration, noise-adaptive layout, advanced routing, and pulse-level schedule optimization) has been shown to substantially boost achievable QV compared to black-box, default transpilation (Jurcevic et al., 2020, Pelofske et al., 2022). Custom passmanagers, such as IBM’s QV passmanager (incorporating CPLEX routing and pulse optimization), yield fewer compilation failures and higher maximal $m$ 0, at the cost of significant classical resources (upwards of 100,000 CPU-hours for $m$ 1).

Error mitigation techniques (notably zero-noise extrapolation and dynamical decoupling) have demonstrably increased effective QV by one or more increments (e.g., $m$ 2 from $m$ 3 to $m$ 4 on IBM devices), as measured either with the canonical or mirror quantum volume protocol (LaRose et al., 2022, Pelofske et al., 2023). The “effective quantum volume” must be reported together with applied mitigation and shot overhead to maintain a fair metric (Amico et al., 2023, Pelofske et al., 2023).

5. Variants, Extensions, and Limitations

Original QV specifies “square circuits” (width = depth). Miller et al. extend the concept with Quantum Volumetric Classes, QV- $m$ 5, targeting random circuits of width $m$ 6 and depth $m$ 7. These variants (QV-2, QV-3, etc.) better capture the scaling of circuit depths for workloads with $m$ 8, aligning volumetric benchmarking with algorithmic structure (Miller et al., 2022, Rodenburg, 31 Jan 2025). In practical benchmarking (91% of surveyed quantum algorithms), square (QV-1), quadratic (QV-2), and cubic (QV-3) circuit families suffice.

QV is resource intensive: typical experiments require $m$ 9 random circuits and $m=d$ 0 shots/circuit. Scalability to $m=d$ 1 is classically blocked by exponential cost of the heavy set identification. Parity-constrained QV variants eliminate the O( $m=d$ 2) simulation overhead by structuring the random circuits to have known heavy output subspaces, enabling benchmarking of devices with size far beyond classical simulation (Bistroń et al., 4 Feb 2025). Mirror Quantum Volume (MQV) and related techniques use inversion-based circuits to avoid this bottleneck, but may mask some error types (Amico et al., 2023).

QV has several caveats: it does not directly reflect algorithmic performance for non-scrambling, highly structured workloads; it ignores circuit speed (addressed by metrics such as CLOPS); its single-number nature gives only coarse granularity; and its accuracy depends on transparent reporting of compiler/mapping details (Pelofske et al., 2022, Wack et al., 2021).

6. Theoretical Foundations and Modeling

Quantum Volume reflects a rigorous theoretical interplay between native error rates, connectivity, and gate durations. The effective error per circuit layer governs the achievable $m=d$ 3; for simple depolarizing models, the cross-over point at which mean HOP drops below $m=d$ 4 is set by cumulative error per layer exceeding approximately $m=d$ 5 (Jaeger et al., 2024). For NISQ architectures, QV can be estimated analytically from gate error rates, connectivity exponent $m=d$ 6, and circuit depth scaling (Rodenburg, 31 Jan 2025). Extensions to fault-tolerant architectures incorporate error-corrected logical gates and magic-state distillation overheads to predict QV-k in the logical regime.

Platform-specific QV definitions exist for photonic and measurement-based architectures. For photonic MBQC with Gottesman-Kitaev-Preskill encoding, QV is analytically derived as a function of GKP squeezing and photon transmission efficiency via error-mapping to effective Pauli error probability, then standard QV pass criteria (Zhang et al., 2022).

Topologically, Quantum Volume appears in the context of two-dimensional insulators as the Brillouin zone integral of the square root of the quantum-metric determinant; in this setting, QV variation tracks topological phase transitions and bounds the count of symmetry-protected boundary states (Chiu, 29 Jan 2025).

7. Current Best Practices and Outlook

Best practice prescriptions for maximizing QV include:

Enumerate connected qubit subsets and select for highest HOP.
Employ aggressive compiler optimizations: noise-adaptive layouts, advanced routing, pulse-level schedule.
Exploit dynamic decoupling and readout error mitigation wherever permissible.
Document and report error mitigation overheads and compilation-level optimizations for transparency (Pelofske et al., 2022, Amico et al., 2023).

Principal open challenges include: standardizing compiler stacks for reproducibility, reducing classical overhead for heavy set computations (especially for $m=d$ 7), integrating QV full-stack analysis with error-corrected logical layers, benchmarking non-square (rectangular) algorithmic workloads, and systematically correlating QV to application performance metrics (Pelofske et al., 2022).

Quantum Volume remains the prevailing system-level metric for quantifying NISQ device capabilities. Its robustness, platform-independence, and sensitivity to both hardware and software-stack optimizations have made it foundational for benchmarking and progress tracking in quantum computing (Pelofske et al., 2022, Cross et al., 2018, Wack et al., 2021).