Multiprogramming Quantum Computing
- Multiprogramming Quantum Computing is a paradigm that concurrently executes multiple quantum circuits on a single device by partitioning physical qubits and optimizing scheduling for enhanced throughput and fidelity.
- Key techniques include hardware-specific resource partitioning, crosstalk-aware compilation, and dynamic scheduling which yield measurable improvements in system-level speed and reduced SWAP overhead.
- The approach leverages layered system architectures and sophisticated error-mitigation strategies to balance throughput gains with fidelity trade-offs, as demonstrated in frameworks like OQTOPUS and FLAMENCO.
Multiprogramming Quantum Computing (MPQC) refers to the concurrent execution of multiple quantum programs or circuits on a single quantum device, with the goals of maximizing resource utilization and throughput while maintaining acceptable fidelity. MPQC targets both Noisy Intermediate-Scale Quantum (NISQ) devices and fault-tolerant quantum computers, spanning software, compiler, system architecture, and hardware-specific techniques. Recent advances encompass architectural abstractions, resource allocation, error-mitigation, crosstalk-aware compilation, and online scheduling, across diverse quantum hardware including superconducting circuits, trapped ions, and neutral atom platforms.
1. System Architectures and Layered Abstractions
MPQC architectures typically decompose into multi-layered stacks, with design principles centered on resource partitioning, parallelism, and hardware abstraction. A representative open-source system is OQTOPUS (Kakuko et al., 31 Jul 2025):
- Frontend: User interaction layer, e.g., QURI Parts OQTOPUS combines a Python SDK and web UI.
- Cloud Services: Job and resource management components (e.g., serverless AWS orchestration for user/job queues).
- Backend Engine: Handles scheduling, bundling/partitioning, transpilation, hardware mapping, and submission to the quantum device.
In OQTOPUS, the backend exposes an explicit multiprogramming layer as a microservice, enabling users to submit arrays of quantum circuits bundled within a single job. The engine assigns virtual qubits to disjoint subsets of the device's physical qubits, issues a concatenated OpenQASM 3 program, and demultiplexes results post-execution.
Alternative frameworks—for both single-device and distributed multi-QPU backends—extend MPQC via middleware for circuit decomposition and parallelization, as demonstrated by Quantum Brilliance's SDK with both asynchronous and MPI-based workload scattering across heterogeneous quantum accelerators (Nguyen et al., 2022).
Recent proposals such as FLAMENCO (Zhao et al., 3 Jan 2026) re-architect MPQC to remove latency bottlenecks by offline compiling multiple device-specific versions of each program across various compute-unit regions and, at runtime, dynamically orchestrate their placement according to fidelity and crosstalk metrics.
2. Qubit Partitioning, Scheduling, and Resource Management
Effective MPQC requires robust resource management strategies:
- Partitioning: Assigns physical qubits among multiple circuits. Essential constraints mandate pairwise-disjoint allocation: for circuits requiring logical qubits each, the sum must hold (Kakuko et al., 31 Jul 2025, Niu et al., 2022).
- Scheduling: Policies vary from strict FIFO job sequencing (OQTOPUS manual bundling), static batch assignment (fixed queue in QuMC (Niu et al., 2021)), to dynamic, fidelity-aware co-location (CDAP/X-SWAP (Liu et al., 2020), large-first orchestrator (Zhao et al., 3 Jan 2026)).
- Region Allocation: Approaches like compute-unit abstraction (FLAMENCO) reduce the combinatorial search space for physically contiguous, minimally crosstalking regions.
- Crosstalk and Resource Buffers: Compiler-driven or heuristic resource partitioners enforce variable “buffer” regions of idle qubits to suppress crosstalk, trading off throughput for fidelity (Ohkura et al., 2021).
Advanced partitioning algorithms include Community Detection Assisted Partition (CDAP), which accounts for both device topology and calibrated fidelity, and various greedy or heuristic subgraph partitioners that incorporate Simultaneous Randomized Benchmarking (SRB) to capture correlated errors (Niu et al., 2021, Liu et al., 2020). In neutral atom systems, such as those targeted by DYNAMO, the assignment further integrates spatial constraints unique to the architecture, for example, blockade radii and movement paths (Sun et al., 7 Jul 2025).
3. Compilation, Transpilation, and Multi-Circuit Optimization
The compilation layer in MPQC systems must jointly map, transpile, and optimize multiple independent quantum circuits:
- Bundling and Transpilation: Circuits are concatenated and mapped onto partitioned subgraphs, with transpilers (e.g., Qiskit, ouqu-tp, Tranqu Server) operating on the combined circuit to apply layout, routing, and local optimizations, while respecting qubit exclusivity across subcircuits (Kakuko et al., 31 Jul 2025).
- Inter-Program SWAPs: Standard approaches confine routing operations (e.g., SWAP insertion) within a partition; MPQC-aware schedulers (X-SWAP) allow inter-program SWAP gates, decreasing the cumulative SWAP cost and improving fidelity on devices with limited connectivity (Liu et al., 2020).
- Crosstalk-Awareness: Techniques such as SRB-enhanced mapping transition engines or buffer-zone insertion mitigate adverse noise amplification from simultaneous two-qubit gate activity (Niu et al., 2021, Ohkura et al., 2021).
- Multi-Version Scheduling: The FLAMENCO approach compiles each circuit into multiple, region-specific binaries, enabling rapid runtime matchmaking based on updated device calibration, crosstalk threat, or job queue state (Zhao et al., 3 Jan 2026).
- Constraint-Based Scheduling: On neutral atom platforms, scheduling is posed as a constrained optimization (NP-hard in general), addressed via heuristics, cycle-accurate gating, and SMT-encoded hardware constraints (movement, blockade, and exclusivity) (Sun et al., 7 Jul 2025).
Tables capturing concrete throughput and process improvements (e.g., shots/sec, compilation reduction factor) illustrate that MPQC strategies can yield 1.5× to 50× improvements in system-level speed, along with reductions in circuit depth, stage count, and SWAP overhead.
| Approach | Throughput Increase | Fidelity Change | Notable Mechanism |
|---|---|---|---|
| OQTOPUS MPQC (Kakuko et al., 31 Jul 2025) | ~1.5× | Neutral/minor loss | Bundling, FIFO, manual |
| QuMC (Niu et al., 2021) | 2–8× (TRF) | <10% typical loss | Partition+fidelity-managed |
| CDAP-XSWAP (Liu et al., 2020) | Up to 43% | 12% over baseline | Community & swap enabled |
| FLAMENCO (Zhao et al., 3 Jan 2026) | 5–50× (latency) | +5–14% | Offline, fidelity selection |
| DYNAMO (Sun et al., 7 Jul 2025) | 14× (compilation) | N/A (focus: makespan) | SMT, spatial deformation |
4. Error Mitigation and Crosstalk Management
Independently driven subcircuits in MPQC can experience increased error rates due to simultaneous gate operations, biased readout, and context-dependent crosstalk. Mitigation strategies include:
- Readout Error Mitigation: OQTOPUS applies single-qubit tensor-product calibration matrices independently on each logical qubit in bundled MPQC jobs, with the overall vectorized correction: , (Kakuko et al., 31 Jul 2025).
- Crosstalk Detection and Suppression: The trade-off between maximal throughput and acceptable fidelity is managed by dynamically sizing inter-program buffers (quantified in “hops” or physical qubit spacing), with device-dependent heuristics based on Simultaneous Randomized Benchmarking and coefficient-of-variation metrics (Ohkura et al., 2021). Software pipelines (e.g., palloq) select layouts dynamically based on crosstalk presence metrics.
- Future Extensions: Plans for pulse-level compensation, echo sequences, and cross-calibration routines are under development but are not yet standard in production clouds (Kakuko et al., 31 Jul 2025).
Empirical results validate the necessity of buffer-based crosstalk suppression on high-noise, low-connectivity devices (e.g., IBMQ Mumbai), while high-connectivity architectures (e.g., trapped-ion H1-2) exhibit minimal MPQC-induced fidelity loss (Niu et al., 2022).
5. Experimental Evaluations and Performance Benchmarks
MPQC has been evaluated across cloud-accessible superconducting and trapped-ion devices, as well as neutral atom arrays and hybrid cluster setups:
- Sampling and Estimation: On Osaka University’s 64-qubit chip, OQTOPUS reported a reduction from 0.82s to 0.55s wall-clock for 2000 shots in two-circuit vs. bundled MPQC mode, achieving 1.5× throughput (Kakuko et al., 31 Jul 2025).
- SWAP and Fidelity Metrics: Community-aware mapping with X-SWAP yields on average 12% fidelity improvement and 11.1% reduction in SWAP overhead relative to prior baselines for simultaneous execution (Liu et al., 2020).
- Platform Dependence: On Quantinuum H1-2 (all-to-all connectivity, low error), simultaneous multi-programming led to only –0.5% fidelity loss and 31% cost reduction, while the same experiment on IBMQ Mumbai (superconducting, nearest-neighbor) incurred a 3.4% loss with no budget gain (Niu et al., 2022).
- Throughput-Fidelity Trade-offs: Increased crosstalk buffers improve PST by up to 6 points (on a 0–1 scale) but decrease throughput nearly proportionally; shallow circuits (CX-depth <20) are crosstalk-robust, while deeper circuits require intermediate buffering (Ohkura et al., 2021).
- Neutral Atom Scheduling: DYNAMO, targeting neutral atom arrays, reduced Rydberg stage count by an average of 50.5%, with up to 14.4× compilation speedup and balanced multi-QPU utilization (Sun et al., 7 Jul 2025).
6. Applications and Advanced Use-Cases
Beyond raw hardware throughput, MPQC methods enable new algorithmic and system paradigms:
- Quantum Search: Decomposing Grover's algorithm into partial diffusions enables MPQC-based parallelization, thereby increasing the rotation angle and success probability per iteration; experimental implementations doubled or more the observed success rate compared to canonical Grover’s, at the cost of increased qubit usage (2207.14464).
- Fault-Tolerant Scheduling: For lattice-surgery-based FTQC, MPQC is formalized as 3D bin-packing of polycubes representing jobs, approximated by cuboids for tractable scheduling. Online heuristics (corner-greedy, defragmentation) achieve 2–4× throughput speedup with millisecond latency—crucial for scalable, parallelized quantum cloud services (Nishio et al., 10 May 2025).
- Multi-party Quantum Computation (Editor’s term: "Secure-MPQC"): Protocols for secure, composable delegated computation (with blindness and verifiability under malicious majority, including constant-round constructions) have been developed and proven optimal with only two quantum rounds (Kapourniotis et al., 2021, Bartusek et al., 2020).
7. Limitations, Open Problems, and Future Directions
Despite progress, several challenges remain:
- Dynamic Packing and Scheduling: Existing auto-packing is limited; end-to-end co-optimization for heterogenous, variable-sized jobs with real-time device calibration is a key research frontier (Kakuko et al., 31 Jul 2025, Zhao et al., 3 Jan 2026).
- Mitigation of Non-local Errors: Full-stack, pulse-aware crosstalk mitigation and more granular calibration for large-scale devices are under active development.
- Extended Applicability: Adaptive partitioning and compilation for dynamically generated, data-dependent circuits remains unresolved; FLAMENCO’s multi-executable model excels in repeated workloads but not in dynamic-program settings (Zhao et al., 3 Jan 2026).
- Fairness and Utilization: High-fidelity scheduling for large concurrent jobs may disadvantage smaller jobs or reduce aggregate utilization, especially with strict non-overlap policies (Zhao et al., 3 Jan 2026).
- Hardware Scalability: As platforms move to hundreds of qubits, hardware-aware, scalable MPQC abstractions and OS-style schedulers (e.g., the DYNAMO approach for neutral atom QPUs) will grow in importance (Sun et al., 7 Jul 2025).
- Security & Composability: In cryptographic MPQC, minimum quantum round-complexity is established, but realizing two-round, universally composable secure protocols beyond CRS and quantum oracles is likely impossible (Bartusek et al., 2020).
Continued advancement in MPQC demands coordinated innovation across hardware architectures, compiler theory, error-mitigation pipelines, and cloud systems, with emphasis on automated scheduling, multi-party security, and platform-specific adaptation.