Hybrid Quantum-HPC Middleware Systems for Adaptive Resource, Workload and Task Management

Published 3 Apr 2026 in quant-ph and cs.DC | (2604.03445v1)

Abstract: Hybrid quantum-classical applications pose significant resource management challenges due to heterogeneity and dynamism in both infrastructure and workloads. Quantum-HPC environments integrate quantum processing units (QPUs) with diverse classical resources (CPUs, GPUs), while applications span coupling patterns from tightly coupled execution to loosely coupled task parallelism with varying resource requirements. Traditional HPC schedulers lack visibility into application semantics and cannot respond to fluctuating resource availability at runtime. This paper presents a middleware-based approach for adaptive resource, workload, and task management in hybrid quantum-HPC systems. We make four contributions: (i) a conceptual four-layer middleware architecture that decomposes management across workflow, workload, task, and resource levels, enabling application-aware scheduling over heterogeneous quantum-HPC resources; (ii) a set of execution motifs capturing interaction and coupling characteristics of hybrid applications, realized as quantum mini-apps for systematic workload characterization; (iii) Pilot-Quantum, a middleware framework built on the pilot abstraction that enables late binding and dynamic resource allocation, adapting to resource and workload dynamics at runtime; and (iv) Q-Dreamer, a performance modeling toolkit providing reusable components for informed workload partitioning, including a circuit-cutting optimizer that analytically derives optimal partitioning strategies. Evaluation on heterogeneous HPC platforms (Perlmutter, NVIDIA DGX with H100/B200 GPUs) demonstrates efficient multi-backend orchestration across CPUs, GPUs, and QPUs for diverse execution motifs. Q-Dreamer predicts optimal circuit cutting configurations with up to 82% accuracy.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper introduces a four-layer middleware architecture that integrates Pilot-Quantum and Q-Dreamer for adaptive resource management across heterogeneous HPC and quantum platforms.
It demonstrates efficient workload characterization and dynamic task scheduling with predictive optimization, achieving up to 82% accuracy in circuit cutting configurations.
Experimental evaluations reveal negligible scheduling overhead and significant performance gains in distributed simulations, quantum-classical workflows, and mini-app benchmarks.

Hybrid Quantum-HPC Middleware Systems: Adaptive Resource, Workload, and Task Management

Overview and Motivation

Hybrid quantum-classical applications increasingly demand sophisticated management of heterogeneous resources, workloads, and tasks, especially as QPUs exhibit scarcity, variability, and distinct operational constraints relative to classical resources. The integration of quantum and HPC environments exposes challenges not adequately addressed by existing job schedulers and quantum programming frameworks, notably the need for multi-level, application-aware scheduling and adaptive response to dynamic system states.

This paper introduces a conceptual four-layer middleware architecture, execution motifs for workload characterization, a Pilot-Quantum middleware leveraging the pilot abstraction for late-binding and dynamic orchestration, and the Q-Dreamer toolkit for performance-model-driven optimization of circuit cutting workloads. Evaluation demonstrates efficient orchestration across CPUs, GPUs, and QPUs, and strong predictive accuracy (up to 82%) for optimal circuit cutting configurations.

Quantum-HPC Integration Modes

Quantum-HPC integration occurs in three principal modes:

HPC-for-Quantum: Classical HPC resources are tightly coupled for quantum support tasks such as calibration, error correction, and dynamic circuits, demanding microsecond-latency interactions within QPU coherence times.
Quantum-in-HPC: Hybrid applications (e.g. VQA, SQD, Hamiltonian simulation) loosely couple quantum and classical components, exploiting parallelism and iterative classical-quantum feedback.
Quantum-about-HPC: Quantum capabilities are exposed as workflow stages, enabling integration via composable task orchestration and data transformations.
Figure 1: Quantum-HPC integration patterns illustrating coupling requirements and orchestration complexity across modes.

Each mode imposes distinct requirements for resource co-allocation, task scheduling, and adaptability, necessitating flexible access models (dedicated, shared, session-based) for QPUs.

Conceptual Middleware Architecture and State of the Art

The proposed four-layer middleware architecture decomposes resource management into:

L4 Workflow Layer: Manages workflows, dependencies, and user interfaces, abstracting hybrid execution.
L3 Workload Layer: Optimizes strategic resource allocation and logical-to-physical mapping, informed by application-specific heuristics.
L2 Task Layer: Coordinates task execution, load balancing, progress monitoring, and data movement.
L1 Resource Layer: Integrates heterogeneous resources and schedules tasks with support for tight coupling and multi-tenancy.

This layered separation enables hierarchically delegated scheduling, wherein each level refines decisions with distinct knowledge domains: application semantics (L4/L3) and dynamic system states (L2/L1).

Figure 2: Quantum software stack highlighting the interaction between quantum programming environments, hybrid runtimes, and emerging backend/resource standards.

Current middleware solutions provide either static, coarse-grained assignment or limited adaptability, lacking systematic optimization for hybrid workload characteristics (e.g., circuit cutting, variational workflows). Pilot-Quantum fills this orchestration gap, and Q-Dreamer augments it with predictive optimization.

Execution Motifs and Quantum Mini-Apps

Motifs encode recurring quantum-classical interaction patterns, categorized as basic (e.g., circuit execution, distributed simulation, circuit cutting, error mitigation) or compositional (e.g., multi-stage pipelines, synchronous/asynchronous parallel VQA, generative quantum algorithms).

Motif characterization reveals that task coupling and interaction patterns are heterogeneous and stage-dependent, demanding adaptive middleware capable of late-binding, dynamic resource reallocation, and cross-layer optimization.

Figure 3: Basic and compositional execution motifs for quantum-HPC workflows, demonstrating patterns of coupling and coordination.

Mini-apps instantiate these motifs into executable prototypes, enabling performance benchmarking, resource characterization, and hardware/software co-design.

Pilot-Quantum Middleware Architecture

Pilot-Quantum implements application-level scheduling via pilots—placeholder jobs that acquire and maintain resource pools, managed by a Pilot-Manager orchestrating strategic placement and dynamic distribution. This abstraction supports:

Unified orchestration across CPUs, GPUs, QPUs, and simulators.
Plugin extensibility for backends and execution engines.
Application-level scheduling informed by semantics and dynamic state.
Late-binding adaptation for heterogeneous workloads and fluctuating resource availability.
Figure 4: Pilot-Quantum system core architecture, emphasizing resource orchestration via the Pilot-Manager across classical and quantum infrastructures.

The multi-level scheduling topology permits redistribution and adaptive task allocation responsive to runtime feedback (e.g., convergence triggers, resource drift).

Q-Dreamer Toolkit and Circuit Cutting Optimization

Q-Dreamer delivers two layers: Core (resource detection, workload analysis) and Workload Management Tools (application/middleware tools for configuration optimization). The Circuit Cutting Resource Optimizer (CCRO) applies a calibrated analytical speedup model to recommend optimal cut placement and subcircuit sizing, balancing sampling overhead, parallel efficiency, and device capability.

Figure 5: Q-Dreamer architecture showing re-usable components for resource detection and workload analysis supporting adaptive scheduling decisions.

The CCRO's model is calibrated per-backend using empirical data, achieving up to 82% accuracy in predicting optimal cut configurations and correctly identifying trade-offs between subcircuit parallelism and reconstruction overhead.

Experimental Evaluation

Evaluation is conducted on Perlmutter and NVIDIA DGX platforms, utilizing mini-apps spanning circuit execution, distributed simulation, QML workflows, and circuit cutting. Key results:

Pilot-Quantum scheduling overhead is negligible relative to workload execution ( $<4$ ms per task).
GPU-accelerated simulators outperform cloud and CPU-based backends by one to two orders of magnitude in runtime scaling with qubit count.
Distributed state vector simulation enables large-scale quantum circuit emulation, scaling up to 39 qubits for non-gradient computation.
Multi-stage QML pipeline (e.g., CIFAR-10 compression/classification) demonstrates significant throughput and scaling benefits, though efficiency decreases with node count due to resource registration bottlenecks.
Figure 6: Circuit execution performance across IonQ, IBM Eagle, and Qiskit Aer simulators.

Figure 7: Distributed state vector simulation scaling with and without gradient calculation for SEL circuits.

Figure 8: CIFAR-10 compression runtime and efficiency as node count increases.

Figure 9: Batch processing time using vmap, JIT, and combined optimizations in quantum classifier workflows.

For circuit cutting:

CCRO accurately predicts speedup-optimal cut placement, achieving peak speedup at $k=2$ cuts for both GPU and CPU backends, before degradation from exponential sampling overhead.
Device-specific and general models achieve 70–93% accuracy in identifying speedup-optimal configurations.
Limitations include efficiency model breakdown for imbalanced workloads, lack of fidelity-aware modeling, and non-transferability across circuit families.
Figure 10: Circuit cutting strong scaling for 36-qubit EfficientSU2 circuit as a function of workers.

Figure 11: Circuit cutting runtime, measured speedup, and CCRO model predictions for GPU and CPU backends.

Implications and Future Directions

Practically, the motif-driven methodology and mini-app benchmarks provide a systematic foundation for middleware design and evaluation, supporting reproducibility and objective performance comparison. Pilot-Quantum extends proven distributed computing abstractions to quantum-HPC, enabling efficient adaptive orchestration. Q-Dreamer demonstrates the value of analytical and data-driven workload optimization, potentially serving as evaluator environments for RL or ML-based scheduling strategies.

Theoretically, the layered architecture facilitates cross-layer optimization and abstraction, separating strategic workload partitioning from operational execution. This could catalyze development of fidelity-aware scheduling policies and resource allocation strategies, integrating physical QPU characteristics and error rates.

Future directions include:

Extending Q-Dreamer to additional execution motifs and integrating fidelity-aware models.
Developing cross-layer optimization frameworks informing resource-level and workflow-level decision-making.
Standardizing mini-app benchmarks for quantum-HPC middleware comparison.

Conclusion

Hybrid quantum-HPC systems demand adaptive, application-aware middleware architectures for orchestrating heterogeneous resources and dynamic workloads. Pilot-Quantum and Q-Dreamer, together with motif-driven mini-apps, establish a foundation for systematic, scalable management of quantum-classical workflows. Results validate significant gains in orchestration flexibility and efficiency, and demonstrate robust analytical optimization for circuit cutting. The approach provides actionable insights and extensible methodologies for future research at the intersection of quantum computing and high-performance systems.

Markdown Report Issue