Two-Qubit Gate Count Reduction

Updated 31 January 2026

Two-qubit gate count reduction refers to methods that minimize entangling gates in quantum circuits to enhance fidelity and reduce errors.
Techniques such as analytic constructions, architecture-aware numerical optimizations, and ZX-calculus rewrites approach theoretical lower bounds on entangling operations.
Practical applications include state preparation, arithmetic circuits, and oracles, achieving up to a 50% reduction in two-qubit gate counts in experiments.

Two-qubit gate count reduction is a central objective in quantum circuit design and compilation, driven by the substantial noise and hardware cost associated with two-qubit entangling gates (e.g. CNOT, CZ) relative to single-qubit rotations. A broad spectrum of techniques—from analytic circuit constructions and architecture-aware numerical optimizations to algebraic and graphical rewriting—have been developed to minimize the two-qubit gate count for various quantum algorithms, generic unitary synthesis, and domain-specialized subroutines. Recent work has approached or achieved the theoretical lower bounds on two-qubit gates for small registers and demonstrated dramatic practical savings across state preparation, arithmetic, chemistry, and Clifford+T circuit optimization.

1. Theoretical Lower Bounds for General Unitary Decomposition

The minimal two-qubit (CNOT) gate count required for generic $n$ -qubit unitary synthesis was established in [Phys. Rev. A 69, 062321 (2004)] as

$N_{\rm theo}(n) = \left\lceil \frac{1}{4}\left(4^n - 3n - 1\right) \right\rceil,$

a bound derived from parameter counting and confirmed to be tight except for small additive integrality effects. No known circuit construction for arbitrary $U \in U(2^n)$ exactly achieves this bound in general beyond $n = 2$ .

Rakyta et al. introduced a purely numerical, sequentially optimized approach employing a fixed depth- $N(n-1)$ "layered" template circuit with repeated single-qubit rotations (U3 gates) and CNOTs to systematically disentangle qubits one by one. Block-wise classical optimization of layer parameters enables the construction of circuits with CNOT counts within 1–2 gates of the theoretical minimum up to $n=5$ —for example, $15$ CNOTs for a generic $3$-qubit unitary (lower bound: $14$), and $63$ for $n=4$ (lower bound: $61$), the previous best (optimized QSD) being $20$ and $100$ respectively (Rakyta et al., 2021).

#qubits	Theoretical Bound	QSD Optimized	Sequential Opt. (SO)
2	3	3	3
3	14	20	15
4	61	100	63

The algorithm accommodates realistic hardware topologies by redirecting each CNOT to physically connected pairs and achieves close-to-optimal CNOT counts with an observed practical overhead of only 10–15% (Rakyta et al., 2021).

2. Two-Qubit Gate Minimization in High-Level Structures

Certain quantum subroutines admit domain-specific strategies that leverage problem structure or subspace constraints, yielding substantial CNOT-count savings compared to structure-agnostic decompositions.

A. Unitary Vibrational Coupled Cluster (UVCC)

In the Trotterized UVCC ansatz for vibrational structure calculations, redundancy in the Hilbert space under a unary mapping enables the elimination of half the controls in multi-controlled $R_Y$ gates. For $m$ -mode excitations, standard methods require a $(2m-1)$ -controlled gate; by mapping the logical subspace to states differing only in a single qubit and constructing ancilla-aided "routing" unitaries, one reduces the controlled rotation to $m$ controls. This yields formulas for the CNOT count:

$m$	Exponential	Givens	Redundancy Exploiting
1	4	5	4
2	48	17	14
3	320	49	26
4	1792	158	42

Experimental implementation on Quantinuum H1-1 demonstrates up to 50% theoretical, 28% practical reduction in entangling gates and higher overall fidelities for state preparation (Szczepanik et al., 2024).

B. Multi-Controlled Gate Decomposition

For general $n$ -controlled $U(2)$ gates in high-level programming and quantum oracles, the introduction of a phase-correcting auxiliary qubit allows every controlled $U(2)$ to be rewritten as a product of two $n$ -controlled $SU(2)$ gates. This reduces the CNOT count from $O(n^2)$ (standard) to at most $32n$ for arbitrary $U(2)$ and to $12n$ for multi-controlled Pauli gates, outperforming previous bests of $16n$ for controlled Pauli, and $20n$ for $SU(2)$ (Rosa et al., 2024). This linear scaling is demonstrated on Grover layer circuits with $114$ qubits, reducing CNOTs from $101$k to $2.7$k.

3. Circuit Optimization and ZX-Calculus: Automated Two-Qubit Gate Reduction

Graphical and algebraic rewrites applied to the intermediate representation of quantum circuits—primarily the ZX-calculus—enable systematic two-qubit gate minimization across arbitrary Clifford+T circuits.

Standard ZX-calculus optimizers focus on T-count, sometimes increasing the two-qubit count (e.g., by 22% on average).
Reduction heuristics targeting two-qubit cost (local complementation, pivoting, cost heuristics for candidate rewrites) provide average $\approx$ 16% reduction, up to 40%, for complex arithmetic and logic circuits (Staudacher et al., 2023).
Dynamic grouping of circuit layers, k-step lookahead search, and delayed placement with simulated annealing further improve the two-qubit gate count by 18% on average, with up to 25% case-specific reductions and 4% improvement over heuristic-only ZX methods (Chen et al., 19 Jul 2025).
Non-simplification ZX rules (LC, pivot) used as search congruences in simulated annealing or genetic metaheuristics enable 15–30% and up to 46% additional two-qubit count reductions versus standard extractions—effective particularly for low- to mid-qubit Clifford+T circuits (Krueger, 2022).

Method	Avg. CNOT Change (benchmarks)	Max. Reduction
Heuristic ZX	~16–18%	40+%
Metaheuristic ZX	15–30%	46%
Dynamic grouping ZX	18–25%	25%

4. Gate-Efficient Design for Arithmetic and State Preparation

Two-qubit gate count is minimized in quantum arithmetic and state-preparation by exploiting symmetries, partial product arrangements, and subspace restrictions.

In quantum integer squaring, symmetry-aware arrangement of partial products and anti-diagonal "stitching" cuts the number of required adders by 50%. The resulting quantum squarer achieves $29.41\%$ asymptotic CNOT reduction (from $17n^2$ to $12n^2$ ), with similar depth and T-count savings (Sultana et al., 2024).
In state preparation, identifying "don't-care" regions—basis vectors never occupied during initialization—permits segment-wise peephole resynthesis: single-target windows are replaced according to controllability/observability don't-cares, yielding circuits with 36% fewer CNOTs than standard compiled circuits (Wang et al., 2024). In specific families (e.g., $B_n^{2^{n-1}+1}$ prefix states), the original $2^n$ CNOT scaling is reduced to $O(n)$ .

5. Combinatorial and Oracle-Based Techniques

In oracular and combinatorial search algorithms such as Grover adaptive search, two orthogonal algebraic strategies provide substantive gate count reductions:

Polynomial Factorization: Factorization of higher-order binary objective polynomials into grouped monomials reduces the number and order of multi-controlled phase gates. Each grouped term leads to a lower-weight gate, with per-monomial savings of $20\%$ or more, and typically 20–30% for large instances (Sano et al., 2023).
Objective-Order Halving: Introducing extra order qubits reduces the maximum controlled phase gate from $2B'$ controls to $B'' = B'+1$ , where $B'$ is the binary encoding length. The net effect can be an exponential reduction in CNOT count for algorithms employing high-order constraints, yielding 20–70% fewer two-qubit gates in relevant algorithms.

6. Clifford and Stabilizer Circuits: Canonical Layered Forms

A normal form for $n$ -qubit Clifford circuits—CX–CZ–P–H–CZ–P–H—has been established, allowing gate-layer commutation and absorption of redundant gates. This enables the use of transvection-based algorithms for the CNOT layers, reducing worst-case two-qubit gate-complexity from $O(n^2)$ to $O(n^2/\log n)$ for dense circuits, and achieves 50–80% CNOT reduction for generic graph states once the edge-density exceeds 0.6. Benchmarks on IBM Q devices show this normal form translates into 20–70% fewer hardware-native CNOTs after transpilation (Bataille, 2021).

7. Large Controlled Operations and Resource Trade-Offs

For large multi-controlled gates (especially Toffoli/C $n$ X) in quantum algorithms, the optimal two-qubit count depends on the available ancilla. The "cycle" method partitions controls and and computes partial ANDs into ancillas, reducing overall two-qubit gate cost:

With maximal (linear-in- $n$ ) ancilla, the total two-qubit gate count is $8n-11$ for Peres-style implementations, matching theoretical bests (Brown et al., 2013).
With minimal ancilla ( $O(\sqrt{n})$ ), the two-qubit count doubles to $16n$, but with much lower space cost.
This establishes the continuous ancilla–gate trade-off frontier in large-controlled gate synthesis.

8. Impact and Practical Significance

The reduction of two-qubit gate count is vital for practical and fault-tolerant quantum computation. Decreases in entangling operations translate into higher overall circuit fidelity, longer coherence-limited algorithm runtimes, and direct enhancement of feasibility on NISQ and early fault-tolerant platforms. Methods detailed above have demonstrated record-setting CNOT reductions in both synthetic and hardware experiments, and hybrid approaches combining analytic, numeric, and combinatorial tools continue to push actual circuits toward the information-theoretic limit.

References:

(Rakyta et al., 2021, Szczepanik et al., 2024, Staudacher et al., 2023, Chen et al., 19 Jul 2025, Sultana et al., 2024, Wang et al., 2024, Park et al., 2022, Krueger, 2022, Rosa et al., 2024, Bataille, 2021, Sano et al., 2023, Brown et al., 2013, Lau et al., 6 Aug 2025)