- The paper introduces a four-layer middleware architecture that integrates Pilot-Quantum and Q-Dreamer for adaptive resource management across heterogeneous HPC and quantum platforms.
- It demonstrates efficient workload characterization and dynamic task scheduling with predictive optimization, achieving up to 82% accuracy in circuit cutting configurations.
- Experimental evaluations reveal negligible scheduling overhead and significant performance gains in distributed simulations, quantum-classical workflows, and mini-app benchmarks.
Hybrid Quantum-HPC Middleware Systems: Adaptive Resource, Workload, and Task Management
Overview and Motivation
Hybrid quantum-classical applications increasingly demand sophisticated management of heterogeneous resources, workloads, and tasks, especially as QPUs exhibit scarcity, variability, and distinct operational constraints relative to classical resources. The integration of quantum and HPC environments exposes challenges not adequately addressed by existing job schedulers and quantum programming frameworks, notably the need for multi-level, application-aware scheduling and adaptive response to dynamic system states.
This paper introduces a conceptual four-layer middleware architecture, execution motifs for workload characterization, a Pilot-Quantum middleware leveraging the pilot abstraction for late-binding and dynamic orchestration, and the Q-Dreamer toolkit for performance-model-driven optimization of circuit cutting workloads. Evaluation demonstrates efficient orchestration across CPUs, GPUs, and QPUs, and strong predictive accuracy (up to 82%) for optimal circuit cutting configurations.
Quantum-HPC Integration Modes
Quantum-HPC integration occurs in three principal modes:
Each mode imposes distinct requirements for resource co-allocation, task scheduling, and adaptability, necessitating flexible access models (dedicated, shared, session-based) for QPUs.
Conceptual Middleware Architecture and State of the Art
The proposed four-layer middleware architecture decomposes resource management into:
- L4 Workflow Layer: Manages workflows, dependencies, and user interfaces, abstracting hybrid execution.
- L3 Workload Layer: Optimizes strategic resource allocation and logical-to-physical mapping, informed by application-specific heuristics.
- L2 Task Layer: Coordinates task execution, load balancing, progress monitoring, and data movement.
- L1 Resource Layer: Integrates heterogeneous resources and schedules tasks with support for tight coupling and multi-tenancy.
This layered separation enables hierarchically delegated scheduling, wherein each level refines decisions with distinct knowledge domains: application semantics (L4/L3) and dynamic system states (L2/L1).
Figure 2: Quantum software stack highlighting the interaction between quantum programming environments, hybrid runtimes, and emerging backend/resource standards.
Current middleware solutions provide either static, coarse-grained assignment or limited adaptability, lacking systematic optimization for hybrid workload characteristics (e.g., circuit cutting, variational workflows). Pilot-Quantum fills this orchestration gap, and Q-Dreamer augments it with predictive optimization.
Execution Motifs and Quantum Mini-Apps
Motifs encode recurring quantum-classical interaction patterns, categorized as basic (e.g., circuit execution, distributed simulation, circuit cutting, error mitigation) or compositional (e.g., multi-stage pipelines, synchronous/asynchronous parallel VQA, generative quantum algorithms).
Motif characterization reveals that task coupling and interaction patterns are heterogeneous and stage-dependent, demanding adaptive middleware capable of late-binding, dynamic resource reallocation, and cross-layer optimization.
Figure 3: Basic and compositional execution motifs for quantum-HPC workflows, demonstrating patterns of coupling and coordination.
Mini-apps instantiate these motifs into executable prototypes, enabling performance benchmarking, resource characterization, and hardware/software co-design.
Pilot-Quantum Middleware Architecture
Pilot-Quantum implements application-level scheduling via pilots—placeholder jobs that acquire and maintain resource pools, managed by a Pilot-Manager orchestrating strategic placement and dynamic distribution. This abstraction supports:
The multi-level scheduling topology permits redistribution and adaptive task allocation responsive to runtime feedback (e.g., convergence triggers, resource drift).
Q-Dreamer delivers two layers: Core (resource detection, workload analysis) and Workload Management Tools (application/middleware tools for configuration optimization). The Circuit Cutting Resource Optimizer (CCRO) applies a calibrated analytical speedup model to recommend optimal cut placement and subcircuit sizing, balancing sampling overhead, parallel efficiency, and device capability.
Figure 5: Q-Dreamer architecture showing re-usable components for resource detection and workload analysis supporting adaptive scheduling decisions.
The CCRO's model is calibrated per-backend using empirical data, achieving up to 82% accuracy in predicting optimal cut configurations and correctly identifying trade-offs between subcircuit parallelism and reconstruction overhead.
Experimental Evaluation
Evaluation is conducted on Perlmutter and NVIDIA DGX platforms, utilizing mini-apps spanning circuit execution, distributed simulation, QML workflows, and circuit cutting. Key results:
- Pilot-Quantum scheduling overhead is negligible relative to workload execution (<4 ms per task).
- GPU-accelerated simulators outperform cloud and CPU-based backends by one to two orders of magnitude in runtime scaling with qubit count.
- Distributed state vector simulation enables large-scale quantum circuit emulation, scaling up to 39 qubits for non-gradient computation.
- Multi-stage QML pipeline (e.g., CIFAR-10 compression/classification) demonstrates significant throughput and scaling benefits, though efficiency decreases with node count due to resource registration bottlenecks.
Figure 6: Circuit execution performance across IonQ, IBM Eagle, and Qiskit Aer simulators.
Figure 7: Distributed state vector simulation scaling with and without gradient calculation for SEL circuits.
Figure 8: CIFAR-10 compression runtime and efficiency as node count increases.
Figure 9: Batch processing time using vmap, JIT, and combined optimizations in quantum classifier workflows.
For circuit cutting:
- CCRO accurately predicts speedup-optimal cut placement, achieving peak speedup at k=2 cuts for both GPU and CPU backends, before degradation from exponential sampling overhead.
- Device-specific and general models achieve 70–93% accuracy in identifying speedup-optimal configurations.
- Limitations include efficiency model breakdown for imbalanced workloads, lack of fidelity-aware modeling, and non-transferability across circuit families.
Figure 10: Circuit cutting strong scaling for 36-qubit EfficientSU2 circuit as a function of workers.
Figure 11: Circuit cutting runtime, measured speedup, and CCRO model predictions for GPU and CPU backends.
Implications and Future Directions
Practically, the motif-driven methodology and mini-app benchmarks provide a systematic foundation for middleware design and evaluation, supporting reproducibility and objective performance comparison. Pilot-Quantum extends proven distributed computing abstractions to quantum-HPC, enabling efficient adaptive orchestration. Q-Dreamer demonstrates the value of analytical and data-driven workload optimization, potentially serving as evaluator environments for RL or ML-based scheduling strategies.
Theoretically, the layered architecture facilitates cross-layer optimization and abstraction, separating strategic workload partitioning from operational execution. This could catalyze development of fidelity-aware scheduling policies and resource allocation strategies, integrating physical QPU characteristics and error rates.
Future directions include:
- Extending Q-Dreamer to additional execution motifs and integrating fidelity-aware models.
- Developing cross-layer optimization frameworks informing resource-level and workflow-level decision-making.
- Standardizing mini-app benchmarks for quantum-HPC middleware comparison.
Conclusion
Hybrid quantum-HPC systems demand adaptive, application-aware middleware architectures for orchestrating heterogeneous resources and dynamic workloads. Pilot-Quantum and Q-Dreamer, together with motif-driven mini-apps, establish a foundation for systematic, scalable management of quantum-classical workflows. Results validate significant gains in orchestration flexibility and efficiency, and demonstrate robust analytical optimization for circuit cutting. The approach provides actionable insights and extensible methodologies for future research at the intersection of quantum computing and high-performance systems.