Qdislib: Distributed Quantum Circuit Cutting
- Qdislib is a distributed library for quantum circuit cutting that decomposes large circuits into smaller subcircuits for scalable simulation and execution.
- It employs both wire and gate cutting techniques with quasi-probabilistic post-processing to accurately reconstruct overall circuit outcomes.
- The library leverages graph-based circuit representation and the PyCOMPSs runtime to optimize distributed task scheduling and achieve significant parallel speedups.
Qdislib is a distributed library for quantum circuit cutting, specifically designed to enable the execution and simulation of quantum circuits that exceed the qubit and connectivity limitations of contemporary quantum hardware. Qdislib achieves this by decomposing large quantum circuits into smaller, non-overlapping subcircuits that can be independently executed on classical high‐performance computing (HPC) resources or quantum processing units (QPUs), and subsequently reconstructs the original circuit’s outcome via quasi-probabilistic post-processing. The library is constructed to work seamlessly in hybrid quantum-classical environments and is compatible with leading quantum software toolkits, including Qiskit and Qibo, while leveraging distributed computing frameworks for scalable execution (Tejedor et al., 2 May 2025).
1. Circuit Cutting Methodologies
Qdislib supports wire cutting and gate cutting as fundamental strategies for partitioning circuits:
- Wire Cutting: This method involves severing a qubit’s wire at one or more locations in the circuit, producing subcircuits that can be executed independently. Reconstruction of the original observable (e.g., expectation value ) is performed via a sum over all possible preparations and measurements on the boundary qubits, with coefficients determined by the circuit cutting protocol. The number of subcircuit evaluations required scales as for wire cuts.
- Gate Cutting: Specifically targets entangling two-qubit gates (such as CZ), which are decomposed into sums of tensor products:
where is determined by the gate’s entangling power and are the decomposition coefficients. This technique results in evaluation cost for gate cuts.
Both approaches allow for flexible partitioning, with the choice governed by circuit structure, available hardware resources, and simulation requirements.
2. Graph-Based Circuit Representation and Partitioning
Qdislib represents quantum circuits as directed acyclic graphs (DAGs), where:
- Nodes correspond to quantum gates.
- Edges denote qubit dependencies and temporal execution order.
This hardware-agnostic DAG structure underpins all manipulation and partitioning processes. Circuit partitioning is driven by the FindCut routine, which supports multiple algorithms (e.g., Kernighan–Lin, Girvan–Newman, spectral decomposition, METIS) and is tunable via user-defined constraints, such as maximal subcircuit size. The partitioning process minimizes a loss function:
where are hyperparameters emphasizing the trade-off between the number of cuts, maximal independent subcircuits, and subcircuit size (Tejedor et al., 2 May 2025).
3. Distributed Execution and Integration with HPC Systems
Qdislib leverages the PyCOMPSs task-based runtime to orchestrate parallel computation across CPUs, GPUs, and QPUs (including local and cloud-based backends). Each subcircuit, following the circuit cutting process, is executed as an individual distributed task:
- Task Scheduling: PyCOMPSs dynamically assigns subcircuit evaluations across heterogeneous resources, optimizing for parallelism and resource utilization.
- Backend Compatibility: Subcircuits can be dispatched to classical simulators, hardware QPUs (such as IBM Quantum services), or GPU-accelerated nodes (as available in resources like MareNostrum 5).
- Result Management and Reconstruction: Upon completion, subcircuit results (e.g., expectation values, bitstring distributions) are combined classically to yield the overall output for the original circuit, following the quasi-probabilistic post-processing rules for the selected circuit cutting protocol.
This architecture allows for simulations and hybrid executions of circuits far larger than possible on any single quantum device, by distributing both the simulation load and real QPU calls in parallel.
4. Workflow: From Quantum Program to Distributed Execution
The typical workflow enabled by Qdislib proceeds as follows:
- Input Circuit Loading: Accepts quantum circuits specified in Qiskit, Qibo, or supported formats.
- Circuit-to-Graph Conversion: Translates the circuit into a DAG for manipulation.
- Cut Identification: Automatically selects or allows the user to specify cut locations, optimizing for parallel execution and resource constraints.
- Subcircuit Extraction: Produces the minimal set of subcircuits required for exact probability reconstruction.
- Parallelized Execution: Each subcircuit is submitted as a PyCOMPSs task to an available compute resource (CPU, GPU, or QPU).
- Result Combination: Subcircuit outputs are post-processed according to the wire/gate cutting formalism to obtain observables for the full circuit.
A notable proof of concept demonstrated a 96-qubit hardware-efficient ansatz (HEA) circuit, where Qdislib achieved a speedup exceeding using 64 HPC nodes due to parallel subcircuit execution (Tejedor et al., 2 May 2025).
5. Supported Quantum and Distributed Computing Frameworks
Qdislib is natively compatible with:
- Quantum Circuit Libraries: Qiskit, Qibo, and is designed for potential integration with further toolkits, including tensor network simulators.
- Distributed Runtime: PyCOMPSs, which manages all aspects of parallel task dispatch, inter-task communication, and resource scheduling.
- Hybrid Backends: Supports seamless hybrid workflows where subcircuits may be executed across CPUs, GPUs, and QPUs with appropriate synchronization and result aggregation.
This multiparadigm compatibility fosters portability, modularity, and extensibility for a wide range of large-scale quantum-classical computational tasks.
6. Scalability, Performance, and Limitations
Qdislib’s scalability is chiefly determined by the number of cuts imposed and the amount of available hardware parallelism:
- Scalability: Performance increases with circuit size; partitioning larger circuits into more subcircuits achieves higher parallelization efficiency, particularly when a large number of compute nodes are available.
- Optimal Partitioning: The number and location of cuts must be chosen judiciously—excessive partitioning in small circuits increases overhead without commensurate gains, while deep circuits benefit from aggressive subdivision.
- Backend Efficiency: GPU-accelerated nodes expedite large subcircuit evaluation, while CPU nodes are adequate for smaller fragments; integrating real QPU evaluation is limited by hardware access latency and throughput.
- Quasi-probabilistic Overhead: The exponential scaling ( for wire cuts, for gate cuts) with cut number constrains the practical depth and width of simulatable circuits, even in distributed environments.
- Circuit Reconstruction: The final observable or output is exact only in the limit that all subcircuit combinations are evaluated; in practice, stochastic sampling or approximation strategies may be required for very large .
7. Demonstrated Use Cases and Significance
Qdislib was validated in benchmark studies involving HEA circuits and random circuits, executed in hybrid quantum-classical environments comprising CPUs, GPUs, and both local and remote QPUs. These case studies demonstrate:
- The capability to partition and simulate circuits beyond native quantum hardware size limitations.
- Efficient exploitation of hybrid resources for both simulation and real hardware execution.
- Near-ideal parallel speedup for large problem sizes, up to hardware resource saturation.
Qdislib provides a practical production-quality basis for scalable quantum circuit simulation, hardware-in-the-loop experiments, and design studies in the emerging field of distributed quantum-classical high-performance computing (Tejedor et al., 2 May 2025).