Hybrid Quantum–HPC Workflows

Updated 9 March 2026

Hybrid QHPC workflows are integrated architectures that combine quantum and classical computing to optimize resource utilization and task execution.
They employ layered designs and adaptive scheduling to manage heterogeneous hardware, enhancing reproducibility and scalability in scientific computing.
These workflows enable breakthroughs in quantum chemistry, optimization, and machine learning by coordinating resource management and advanced telemetry.

Hybrid Quantum–High Performance Computing (QHPC) workflows integrate conventional HPC resources (CPUs, GPUs, supercomputers, workflow engines) with quantum computing hardware or simulators, forming tightly coordinated pipelines for scientific computing, optimization, quantum chemistry, and machine learning. Modern QHPC workflow systems implement standardized abstractions for resource management, task scheduling, heterogeneous hardware orchestration, quantum circuit compilation, and inter-platform telemetry. These workflows are realized through layered architectures, vendor-neutral APIs, middleware toolchains, and performance-aware scheduling algorithms, enabling efficient co-execution, resource utilization, and reproducible research across rapidly evolving quantum and classical platforms.

1. Layered QHPC Workflow Architectures

Most QHPC stack designs feature a layered architecture that separates workflow definition, workload partitioning, low-level scheduling, and device-specific execution:

Workflow layer (L4): Logical construction of DAGs with classical, quantum, and hybrid tasks, data dependencies, and control-flow encoded. Systems such as Pilot-Quantum and QFw operate here, allowing graph-based composition and hybrid task demarcation (Mantha et al., 2024, Beck et al., 2024, Shehata et al., 3 Mar 2025).
Workload and task layers (L3-L2): Aggregate and partition tasks, manage dependencies, and expose parallelism for pilot scheduling, batch launches, and adaptivity (Mantha et al., 2024).
Resource and middleware layers (L1): Coordinate resource allocations, expose QPU/CPU/GPU capabilities, provide unified APIs (e.g., QRMI), and schedule jobs across site schedulers (Slurm, PBS, LSF) as well as local or cloud quantum backends (Wennersteen et al., 24 Sep 2025, Shehata et al., 3 Mar 2025).
Execution and hardware layers: Implement device-specific bindings, kernel scheduling, and coordination across QPUs, simulators (Qiskit Aer, NWQ-Sim, cuQuantum), and classical compute nodes. Driver processes, runtime libraries, or micro-architectures (e.g., UQP’s QCP) abstract hardware diversity (Elsharkawy et al., 2024, Zhan et al., 23 Oct 2025).

This modularity enables portability and technology-agnostic orchestration, allowing workflows to migrate between differing hardware as devices and simulators mature (Chundury et al., 17 Sep 2025).

2. Resource Management, Scheduling, and Orchestration

Hybrid resource management is central to QHPC. Key capabilities include:

Pilot Abstractions: Placeholder jobs ("pilots") reserve and manage sets of physical/virtual resources for task execution. The Pilot-Quantum framework formalizes pilots as tuples $\langle R, \tau, c\rangle$ , where $R$ is the resource set, $\tau$ the pilot lifetime, and $c$ the concurrency (Mantha et al., 2024).
Multi-level schedulers: Systems interpose middleware daemons (“second schedulers”) between batch managers and QPU queue layers, supporting malleable allocations, pre-emption, job priorities, and adaptive time-shares (Wennersteen et al., 24 Sep 2025, Rocco et al., 6 Aug 2025).
Formal resource models: Resource allocation matrices $R_{i,r}$ specify usage of $J$ jobs across $M$ CPUs/GPUs and $N$ QPUs. Constraints and objectives include utilization maximization:

$\max_{R} \sum_{i=1}^J\sum_{r=1}^{M+N} R_{i,r}$

subject to per-resource capacity and "soft bound" scheduling latency $L(q,c)\leq \Delta$ (Shehata et al., 3 Mar 2025).

Task & workflow scheduling: QHPC engines construct DAGs of classical and quantum tasks with explicit dependencies, allocating ready tasks to resources according to objective functions: minimizing makespan $M = \max_P(C_i)$ or balancing load $L = \mathrm{Var}\{C_i\}$ (Mantha et al., 2024). Greedy, prioritized, or pattern-aware policies are used.

A table of scheduling modes and their core properties:

Mode	Features	Middleware
Simultaneous alloc	Co-reservation, lockstep exec	SLURM hetjobs, Pilot-Q, QFw
Interleaved alloc	Separate jobs, chained by workflow	Parsl, FireWorks, Pilot-Q
Second-scheduler	QPU malleability, pre-emption	QRMI daemon, custom

Co-allocation models leverage GRES plugins for QPU as special resources, supporting strict binding of quantum to classical resources and efficient job batching (Beck et al., 2024, Shehata et al., 3 Mar 2025, Wennersteen et al., 24 Sep 2025).

3. Hybrid Algorithmic Workflows and Circuit Decomposition

QHPC workflows support both direct and distributed execution of quantum algorithms, leveraging circuit partitioning to map large algorithms onto limited QPU resources:

Variational algorithms (VQE, QAOA): Classical optimizers and quantum kernels are tightly looped; quantum evaluations are batched and pipelined for maximum concurrency (Mantha et al., 2024, Asadi et al., 2024). Algorithm steps are often:

$C(\theta) = \sum_j w_j \langle \psi(\theta)| H_j | \psi(\theta) \rangle$

$\theta^{k+1} = \theta^k - \alpha \nabla_\theta C(\theta^k)$

Circuit cutting and partitioning: Large circuits are decomposed via wire/gate cutting (e.g., Qdislib, ACK hypervisor), yielding exponential numbers of subcircuits executed in parallel across CPUs, GPUs, and QPUs (Tejedor et al., 2 May 2025, Zhan et al., 23 Oct 2025, Miniskar et al., 15 Dec 2025). Wire cuts multiply fragment count by $8^k$ , gate cuts by $6^k$ , requiring advanced scheduling and recombination schemes.
Hybrid sandwich architectures: Workflows may sandwich classical stages (e.g., ML or data encoding) between two quantum computational phases (e.g., VQE $\to$ CNN $\to$ QCNN), orchestrated via high-bandwidth links and optimized feature transfer (Chen et al., 2024).
Distributed and asynchronous execution: Advanced task-based runtimes (IRIS, Pilot-Quantum) enable concurrent, fine-grained execution of quantum subcircuits (task granularity), improving throughput and resource utilization via circuit-cut workloads (Miniskar et al., 15 Dec 2025, Mantha et al., 2024).

Hybrid workflows are increasingly leveraging parameter-shift batching, adjoint differentiation, and distributed state-vector simulation for high throughput on variational circuits (Asadi et al., 2024, Mantha et al., 2024).

4. Software Stacks, APIs, and Interoperability

QHPC middleware exposes standardized programming interfaces for resource management, device abstraction, and workflow integration:

Hardware/vendor neutrality: Systems such as QRMI, QFw, and HybridQ define interfaces allowing interchangeable use of Qiskit, PennyLane, Qibo, PyTket, CUDA-Q, and various classical/quantum backends (Wennersteen et al., 24 Sep 2025, Mandrà et al., 2021, Chundury et al., 17 Sep 2025). API calls are normalized:
- device/session discovery: list_devices()
- job lifecycle: submit_job(), get_status(), cancel_job()
- data transfer: OpenQASM/QIR payloads serialized via JSON or Protobuf (Wennersteen et al., 24 Sep 2025, Cacheiro et al., 25 May 2025).
Plugin and layering models: Plugin architectures (QRMIBackends, QPM-API, UQP) enable dynamic dispatch to vendor SDKs or hardware targets without user code changes (Wennersteen et al., 24 Sep 2025, Elsharkawy et al., 2024, Chundury et al., 17 Sep 2025).
Hybrid monitoring and telemetry: Systems collect and expose job, workflow, and device telemetry via Prometheus and Grafana dashboards; metrics include job wait times, device utilization, and hardware health (Kanazawa et al., 5 Dec 2025, Wennersteen et al., 24 Sep 2025).
Interfacing with workflow engines: Bindings into WMS, such as Parsl, CWL engines, Pegasus, Prefect, and Nextflow, enhance hybrid workflow expressivity and reproducibility (Bieberich et al., 2023, Cranganore et al., 2024, Kanazawa et al., 5 Dec 2025).

Open interfaces and plugin discovery enable seamless switching between emulators, local QPUs, or cloud platforms, aiding portability and promoting best-practices for reproducibility.

5. Benchmarking, Performance Laws, and Scalability

QHPC performance depends critically on simulation method, task granularity, partitioning, and resource strategy:

State-vector simulation scaling: Runtime $T(n,G)=O(G\cdot 2^n/P)$ , where $n$ is qubits, $G$ gate count, and $P$ parallelism (via SIMD, threads, multiprocess, or GPU) (Asadi et al., 2024, Mandrà et al., 2021). MPS and tensor-contraction methods scale as $O(d n \chi^3)$ , where $\chi$ is bond dimension.
Practical strong-scaling: Systems such as HybridQ, Qdislib, and PennyLane Lightning show near-ideal strong-scaling ( $E_s\sim0.8$ –$0.9$) for non-Clifford circuits and effective weak scaling for distributed Hamiltonian gradients or circuit-cut variants (Mandrà et al., 2021, Asadi et al., 2024, Tejedor et al., 2 May 2025).
Communication and orchestration overheads: Models account for classical-quantum message time ( $T_\mathrm{comm}(N)=\alpha+\beta S/N+\gamma\log N$ ), queueing, and persistent memory usage. Batch-mode and malleability approaches are critical for high QPU utilization and reduced wall-time under contention (Asadi et al., 2024, Rocco et al., 6 Aug 2025, Shehata et al., 3 Mar 2025).
Circuit cutting cost: Exponential in cut count, but mitigated by parallelization and adaptive cut set selection. Qdislib demonstrates that for $n=96$ , $k=4$ cuts, $>54\times$ parallel speedup is attainable; too many cuts can degrade due to overhead dominance (Tejedor et al., 2 May 2025).
Variational block performance: For VQLS, COBYLA (gradient-free) minimized total quantum shots ( $20\times$ fewer) compared to gradient-based optimizers for per-epoch time (Shehata et al., 3 Mar 2025). Distributed QAOA achieves $4\times$ speedup on NWQ-Sim relative to cloud backends (Chundury et al., 17 Sep 2025).
Superlinear scaling in middleware: Unified Quantum Platform memory and execution time models are $M(n)\sim n^\beta$ , $T(n)\sim n^\delta$ with $\beta\in[1.15,1.25]$ , $\delta\in[1.2,1.4]$ , i.e., sub-quadratic scaling, critical for n=100–1000 roadmap (Elsharkawy et al., 2024).
End-to-end walltime: $T_\mathrm{total} = T_\mathrm{prep} + T_\mathrm{queue} + T_\mathrm{exec} + T_\mathrm{post}$ , with queueing often dominating for cloud QPU usage; hybrid architectures amortize $T_\mathrm{queue}$ via parallelization and task batching (Bieberich et al., 2023, Cacheiro et al., 25 May 2025).

6. Observability, Reproducibility, and Automation

Persistent monitoring and observability stacks are emphasized for transparency, reproducibility, and systematic optimization:

Persistent telemetry pipelines: Decouple telemetry collection from execution, storing all metrics (job/queue times, resource utilization) and domain-level artifacts (bitstrings, parameter traces) in SQL and object storage for post hoc analysis (Kanazawa et al., 5 Dec 2025).
Experimental reproducibility: ETL workflows and dashboards enable retrospective queries ("all runs with $F=0.6$ ") and eliminate redundant reruns (Kanazawa et al., 5 Dec 2025).
Extended monitoring via Prometheus/Grafana: Standard metrics (queue, shot rate, QPU occupancy, health, fidelity drift) drive system-wide dashboards for real-time tuning (Wennersteen et al., 24 Sep 2025).
Adaptive batch and job management: Workflow engines precompute classical/quantum branches and adapt task submission strategy based on observed queue and resource state (Mantha et al., 2024, Shehata et al., 3 Mar 2025).

Infrastructure-aware QHPC design leverages these features for co-design of workflows and hardware, robust deployment, and efficient administrator and scientific experimentation.

7. Scalability Challenges, Future Directions, and Open Problems

While QHPC workflow frameworks establish scalable integration paths, several open issues remain:

Heterogeneous QPU pools and federation: Extending malleability, pilot scheduling, and multi-tenancy to multi-type and multi-site QPU deployments (gate-based, neutral atom, annealers) (Rocco et al., 6 Aug 2025, Wennersteen et al., 24 Sep 2025).
Dynamic cost and error models: Better task assignment via real-time QPU health, calibration state, and error-mitigation performance metrics. Predictions must auto-tune to drift in hardware or network conditions (Cranganore et al., 2024).
Automated code analysis for quantum candidacy: ML/AI models for task-quantumification and circuit synthesis for kernels lacking explicit quantum analogues (Cranganore et al., 2024).
Fault-tolerant and error-mitigated workflows: Integration of error mitigation and QEC stages as first-class citizens in the workflow DAG, batched and asynchronously launched (Zhan et al., 23 Oct 2025, Miniskar et al., 15 Dec 2025).
Interoperability and standardization: Expanded support for OpenQASM3, QIR, and other intermediate representations for maximal portability.
Benchmarks and application metrics: Ongoing development of standardized benchmark suites and application-level dependability metrics (maturity probes), with analytic bounds for device readiness (e.g., via harmonic analysis of QAOA landscapes) (Onah et al., 14 Sep 2025).

Continual evolution of QHPC workflow systems, enhanced scheduling, adaptive orchestration, persistent observability, and robust API design will underpin the next decade of quantum-classical integrative science.