Distributed Quantum Circuit Cutting

Updated 19 September 2025

Distributed quantum circuit cutting is a method that partitions large quantum circuits into small, manageable fragments to bypass hardware qubit and connectivity constraints.
It employs quasiprobability decompositions, MLFT, and adaptive scheduling to reduce sampling overhead while maintaining quantum fidelity.
Applications include executing complex quantum algorithms on heterogeneous hardware networks by leveraging optimal partitioning and classical post-processing.

Distributed quantum circuit cutting is a suite of techniques that enables the execution of quantum algorithms that exceed the qubit and connectivity limitations of any single quantum processor by partitioning large quantum circuits into smaller fragments. Each fragment can be executed separately—potentially on different quantum devices or classical simulators—and their outputs are recombined through classical post-processing to reconstruct the overall circuit outcome. These methods are foundational for extending the computational capacity of near-term noisy intermediate-scale quantum (NISQ) devices, enabling scalable and modular quantum computation under stringent hardware constraints. The following sections summarize the mathematical principles, algorithmic frameworks, hardware mapping strategies, sampling and classical overheads, and advances in post-processing and optimization that underpin the state of the art in distributed quantum circuit cutting.

1. Mathematical Foundations and Theoretical Frameworks

The core mathematical objects in distributed circuit cutting are quasiprobabilistic decompositions of quantum operations—commonly wires (identity channels) and multi-qubit gates—and the corresponding strategies for reconstructing global measurement statistics from local fragment data. In wire cutting, an identity channel $\mathcal{I}$ is replaced with a sum of local channels via quasiprobability decomposition (QPD),

$\mathcal{I} = \sum_{j} c_j \mathcal{E}_j,$

with $c_j \in \mathbb{R}$ . Each term $\mathcal{E}_j$ is implementable via local operations and classical communication (LOCC). For gate cutting, entangling two-qubit unitaries are expressed via KAK decomposition as

$U = (V_1 \otimes V_2)\left(\sum_{k=0}^{3} u_k\, \sigma_k \otimes \sigma_k\right)(V_3 \otimes V_4),$

where the $\sigma_k$ denote Pauli operators. The quasiprobability expansion is optimized to minimize the total sampling overhead—quantified by the sum of coefficients $|\gamma| = \sum_j |c_j|$ —with closed-form expressions for arbitrary two-qubit unitaries and their tensor products (Schmitt et al., 2023). For joint wire cuts, the optimal overhead for an $n$ -qubit identity channel is $2^{n+1} - 1$ in the classical case, and $2^{n+1}(R(\rho) + 1) - 1$ if a non-maximally entangled state with robustness $R(\rho)$ is shared, achieving lower cost as entanglement increases (Bechtold et al., 19 Jun 2024).

The maximum-likelihood fragment tomography (MLFT) paradigm recasts fragment characterization as a constrained quantum process tomography problem, yielding block-diagonal density operators via the Jamiołkowski isomorphism. The physicality of reconstructed states is enforced by projecting experimental data onto the set of physical channels, for example by setting negative eigenvalues to zero and renormalizing (Perlin et al., 2020).

2. Circuit Partitioning, Cutting Algorithms, and Scheduling

Effective circuit partitioning is critical for both reducing classical recombination overhead and maximizing resource use on distributed hardware. Early frameworks employed mixed-integer programming (MIP) and graph partitioning (Kernighan–Lin, METIS, spectral partitioners) to map circuits (expressed as DAGs of gates/qubits) into subcircuits with minimal cross-partition connectivity (Tang et al., 2022, Tejedor et al., 2 May 2025). Modern approaches, such as FitCut, transform quantum circuits into “gate-only” weighted graphs and employ modularity-based community detection (e.g., Louvain) followed by constrained agglomerative merging that respects worker qubit capacity. The partitioned subcircuits are then scheduled to workers in a manner that maximizes resource utilization and minimizes idle time (Kan et al., 7 May 2024).

Hybrid graph/hypergraph models have also been adopted for joint spatial (QPU-to-QPU) and temporal (depth) segmentation, using partitioning heuristics such as Stoer–Wagner, Fiduccia–Mattheyses, and Kernighan–Lin. The “coupling ratio” metric quantifies circuit interconnectivity, guiding partition selection for communication- or initialization-cost minimization (Cambiucci et al., 12 Apr 2025).

Isomorphic subcircuit detection and contraction (via VF2++ subgraph isomorphism) allow for execution result reuse, reducing the total number of subcircuits that require sampling, particularly effective in circuits with high structural regularity (Dou et al., 28 Feb 2025).

Optimization of cut locations further leverages integer programming to minimize separator size in QUBO graphs underlying QAOA circuits, thereby controlling exponential overhead growth (Wagner et al., 9 Jul 2025).

3. Quantum-Classical Resource Integration and Execution

Distributed circuit cutting naturally integrates heterogeneous computational platforms. Libraries such as Qdislib adopt a language-agnostic DAG intermediate representation, enabling cutting solutions compatible with Qiskit, Qibo, or other frameworks. The fragments are dispatched via distributed runtimes (e.g., PyCOMPSs) to CPUs, GPUs, and QPUs for parallel execution (Tejedor et al., 2 May 2025).

Temporal cuts (wire cuts) support sequential execution to overcome circuit depth limitations, while spatial (gate) cuts distribute the computational load across multiple mutually-networked QPUs, as in the QDCA (“Quantum Divide and Conquer Algorithm”) framework. Hardware-aware frameworks like DisMap construct a virtual system topology by fusing intra-chip and chip-to-chip connections (e.g., high-fidelity EPR pairs), dynamically mapping fragments to hardware to minimize SWAP overhead and maximize fidelity in real hardware deployments (Du et al., 24 Dec 2024).

4. Sampling Complexity, Entanglement Mediation, and Gate Decomposition

The principal bottleneck in circuit cutting is the exponential scaling of sampling overhead with the number of cuts. Naïve wire or gate cuts scale as $3^k$ or $4^k$ for $k$ cuts (per qubit/gate), quickly surpassing feasible shot budgets. Joint cutting strategies using entangled resource states—especially non-maximally entangled (NME) states—offer a tunable trade-off. The sampling overhead for an $n$ -wire joint cut with a composite state $\rho$ is $2^{n+1}(R(\rho)+1) - 1$ , lower than the product of individual wire cuts, with further savings if maximally entangled components are excluded (Bechtold et al., 19 Jun 2024).

For two-qubit gate cutting, a closed formula relates the sampling overhead to KAK decomposition coefficients:

$\gamma(U) = 1 + 2\sum_{i \neq j} |u_i||u_j|.$

Joint cutting of several gates is strictly cheaper than cutting them individually, with the overhead given by $2(1 + \Delta_U)^n - 1$ , submultiplicative for $n$ gates (Schmitt et al., 2023). Classical communication (LOCC) does not improve the minimal overhead for general two-qubit unitaries.

ZX-calculus-based gate cutting permits efficient decomposition of multi-control gates (MCZ/CCZ), with sampling overhead as low as $O(4.5^{2K})$ for $K$ cuts, and practical noise resilience due to the reduction in CNOT gates per fragment (Ufrecht et al., 2023).

5. Advanced Post-Processing, Optimization, and Tomographic Approaches

Post-processing is central to circuit cutting, as the reconstruction of the global outcome from measurement data is both computationally and statistically demanding. MLFT uses maximum-likelihood estimation to robustly “project” raw data onto the set of physical quantum channels, achieving lower infidelities compared to direct (uncorrected) methods, with shot allocation tailored to fragment size and output configuration (Perlin et al., 2020).

Sampling overhead can be further reduced by adaptive Monte Carlo shot distribution, as in the ShotQC framework, where shots are dynamically allocated to subcircuits according to their contributions to overall variance, guided by error propagation analysis and parameter optimization of the decomposition (Chen et al., 23 Dec 2024). Cut parameterization exposes a high-dimensional optimization space for the reconstruction formula, trading classical postprocessing cost for reduced quantum sampling.

Rotation-inspired cut optimization (RICCO) introduces a unitary optimization at cut points, aligning fragment output bases to require only computational basis measurements, thus collapsing the exponential measurement overhead (albeit with an additional optimization step) (Uchehara et al., 2022).

Mutually unbiased basis– (MUB-) based grouping further reduces the number of required subcircuits by enabling simultaneous measurement of commuting observables, yielding polynomial reductions in shot complexity when compared to traditional schemes (Li et al., 27 Oct 2024).

6. Applications and Empirical Performance

Distributed circuit cutting enables the execution of quantum algorithms far beyond the native capacity of any available device. For instance, QDCA achieves solution of combinatorial optimization problems on graphs with 85% more nodes than the available qubit count, with circuit cutting mitigating the impact of hardware noise (Tomesh et al., 2021).

CutQC demonstrates subcircuit evaluation with reduced $\chi^2$ loss versus direct execution on large NISQ devices, with classical postprocessing leveraging CPUs/GPUs for efficient “glue” operations (Tang et al., 2022). MLFT reconstructs output probabilities of randomly partitioned unitary circuits with consistently reduced infidelity compared to full-circuit execution under fixed sampling budgets (Perlin et al., 2020). QAOA sampling tasks with optimized cutting strategies recover higher-qualilty solutions—with circuit size reduction offsetting the shift/broadening in the bitstring output distribution in noisy hardware (Wagner et al., 9 Jul 2025).

Hardware-aware mapping (DisMap) yields up to 20.8% fidelity improvement and 80.2% SWAP overhead reduction in both simulation and IBM Qiskit–based emulation (Du et al., 24 Dec 2024). Modular libraries (Qdislib, QuantCut) support gate and wire cuts in heterogeneous, high-performance environments, achieving close-to-linear speedup as node count increases (Tejedor et al., 2 May 2025, Soloviev et al., 10 Jun 2025).

7. Challenges and Outlook

While distributed circuit cutting holds promise for scaling quantum computation in the NISQ and early fault-tolerant eras, unavoidable exponential scaling in sampling and classical postprocessing remains a significant challenge. The efficiency gains from joint wire/gate cuts, entanglement resources, isomorphic subcircuit reuse, and adaptive shot distribution are critical to practical deployment, with the feasibility of these methods tied to the quality of inter-device entanglement, control over hardware topology, and the availability of classical computational resources for reconstruction.

Advances in composite entanglement resource exploitation, optimal gate decompositions for specific hardware (e.g., ZX-calculus), dynamic scheduling under hardware variability, and hybrid quantum-classical workflow orchestration will continue to drive progress. Hypergraph-based partitioning and resource-aware approaches are likely to be foundational in real-time and large-scale distributed deployments. Experimental results confirm that circuit cutting not only scales system size but can even improve fidelity in situations where smaller fragments are less exposed to hardware noise.

Given these trends, distributed quantum circuit cutting is poised to be a central technique for modular, scalable, and robust quantum computation on both existing and future distributed quantum hardware.