Quantum-Centric Supercomputing

Updated 26 July 2025

Quantum-centric supercomputing is an emerging paradigm that tightly integrates quantum processors and classical HPC to solve computationally intractable scientific problems.
It features hybrid architectures with microkernel OS, layered software stacks, and advanced resource scheduling to optimize simulation workflows.
Its applications in chemistry, physics, and materials science enable large-scale simulations, such as up to 77-qubit models and dynamic quantum-classical hybrid algorithms.

Quantum-centric supercomputing is an emerging computational paradigm characterized by the tight integration—and mutual acceleration—of quantum processors and classical high-performance computing (HPC) systems. Rather than isolating quantum processing units (QPUs) as standalone devices, quantum-centric supercomputing architectures treat quantum and classical resources as co-equal, co-optimized components of scientific workflows, with quantum hardware acting as an accelerator or co-processor for classically intractable computations. This approach is foundational for scaling quantum simulation, chemistry, materials science, and physics research tasks beyond the scope of classical resources alone.

1. Architectures, Principles, and Software Stacks

Quantum-centric supercomputing architectures comprise tightly coupled quantum and classical hardware, orchestrated through specialized operating systems, workload managers, and programming environments. The quantum computer (QPU) operates in tandem with classical CPUs/GPUs, forming a heterogeneous computational fabric (Bravyi et al., 2022, Alexeev et al., 2023, Pascuzzi et al., 21 Aug 2024).

Key architectural elements:

Microkernel quantum operating systems: Modern designs advocate for a microkernel architecture, in which core OS functions—process scheduling, inter-component communication, error isolation—are kept minimal for reliability, with all higher-level services and quantum-specific routines (e.g., circuit compilation, error decoding, QECC tracking) modularized and isolated. Message passing (not stack-based calling) is the backbone of inter-component communication, implemented over HPC-class interconnects (e.g., MPI) (Paler, 17 Oct 2024).
Supercomputer-class infrastructure: QCOS components and associated workloads are executed by default on supercomputers, leveraging high-availability, high-throughput, low-domain latency, and redundancy (Paler, 17 Oct 2024). Fault tolerance is obtained by redundant scheduling and node failover.
Layered software stacks: The quantum software ecosystem spans multiple abstraction layers: low-level hardware access and dynamic circuits (Layer-1); quantum runtime and sampling/expectation calculation (Layer-2); quantum libraries for domain-specific algorithms (Layer-3); and user-facing application workflows (Layer-4) (Bravyi et al., 2022, Alexeev et al., 2023).

Hybrid programming models (e.g., XACC, QCOR, CUDA Quantum, OpenMP quantum offload) provide a unifying interface for quantum task offloading, resource management, and low-latency, tightly-coupled execution between host and quantum accelerator (Lee et al., 2023, Shehata et al., 15 Aug 2024).

2. Simulation Workflows and Methodologies

In quantum-centric supercomputing, workflows commonly follow a hybrid schema: quantum resources address the “core” complexity—such as sampling highly entangled wavefunction components—while classical resources execute data-intensive, memory-bound, or iterative postprocessing and validation steps.

Key techniques:

State-vector and tensor network simulation: Sophisticated quantum circuit simulators (QuEST, qFlex, JUQCS, Queen) implement cache-aware, multi-GPU, and tensor contraction algorithms capable of simulating 30–48 qubit circuits on thousands of supercomputer nodes with strong and weak scaling (Jones et al., 2018, Willsch et al., 2019, Wang et al., 20 Jun 2024, Villalonga et al., 2019, Yong et al., 2021, Liu et al., 2023). Tensor network methods leverage U(1) symmetries and further accelerate computation via blockwise contractions (Liu et al., 2023).
Density matrix simulation of noisy systems: To model practical NISQ and pre-fault-tolerant hardware, full-scale density matrix simulators (e.g., TANQ-Sim) utilize double-precision tensorcore acceleration for deep circuits with general noise, employing gate fusion and GPU-side remote memory access to scale to thousands of GPUs (Li et al., 19 Apr 2024).
Quantum-classical hybrid estimation and diagonalization: Sample-based quantum diagonalization (SQD) algorithms exploit QPUs to generate configuration samples; classical supercomputers then project Hamiltonians into reduced subspaces defined by these sampled configurations and solve large-scale classical eigenvalue problems for observables—ground-state energies, excited state spectra, etc. This hybrid protocol is robust against device noise and offers unconditional upper bound energy metrics at polynomial classical cost (Robledo-Moreno et al., 8 May 2024, Barison et al., 1 Nov 2024, Liepuoniute et al., 7 Nov 2024).
Variational quantum algorithms: VQE and its variants rely on a feedback loop—quantum resources prepare circuit-parameterized trial states and classical optimizers update parameters to minimize observables—implemented via low-latency, on-premise coupling and unified APIs (Shang et al., 2022, Lee et al., 2023).

3. Resource Management, Parallelization, and Scalability

Quantum-centric supercomputing leverages advanced HPC resource scheduling, parallelization, and communication strategies to scale workloads dynamically:

Multi-level parallelization: Workflows decompose naturally along multiple axes: independent simulation tasks per quantum fragment (e.g., DMET for chemistry), parallel measurement of Pauli expectation values, and fine-grained linear algebra operations within circuit simulations. For example, VQE- and MPS-based simulators on Sunway systems exploit 3-level parallelism spanning millions of threads/cores (Shang et al., 2022).
Memory and communication optimizations: Adaptive encoding schemes (e.g., 2-byte amplitude representation in JUQCS-A) and entangling gate decomposition reduce memory bottlenecks, pushing simulation boundaries to 48–77 qubits on petascale supercomputers (Willsch et al., 2019, Robledo-Moreno et al., 8 May 2024). Gate block fusion, cache-aware scheduling, and cross-GPU/NVSHMEM communication reduce latency and increase arithmetic intensity (up to 96× over previous methods) (Wang et al., 20 Jun 2024, Li et al., 19 Apr 2024).
Dynamic job orchestration: Frameworks such as QFw integrate quantum simulators with HPC resource managers (SLURM hetjobs, persistent distributed VMs) and provide APIs for per-job quantum resource assignment, circuit routing, and feedback (Shehata et al., 15 Aug 2024).
Unified programming interfaces and language choices: Adoption of OpenMP and Julia as primary interfaces streamlines classical/quantum hybrid code, paralleling compiled classical code while reducing complexity and overhead (Shang et al., 2022, Lee et al., 2023).

4. Applications in Chemistry, Physics, and Materials Science

Quantum-centric supercomputing has been demonstrated in several large-scale domains:

Quantum computational chemistry: Active-space simulations of molecular systems—N₂ triple bond breaking (58 qubits), [2Fe-2S] and [4Fe-4S] clusters (45–77 qubits)—have been realized by coupling Heron superconducting QPUs with 6400 Fugaku nodes. Quantum-circuit sampling and subspace projection/classical diagonalization workflows yield certified upper bounds on ground- and excited-state energies, overcoming the measurement and depth limitations that render quantum-only approaches infeasible for deep circuits in the pre-fault-tolerant regime (Robledo-Moreno et al., 8 May 2024, Barison et al., 1 Nov 2024, Liepuoniute et al., 7 Nov 2024).
Computation of excited states: The extended SQD algorithm supports the efficient evaluation of excited singlet/triplet states (S₁, T₁), crucial for understanding photochemistry and non-trivial molecular spectra. Application to open-shell systems (e.g., methylene CH₂ triplet vs. singlet, 52 qubits) demonstrates success for difficult, multireference cases where traditional CI or DMRG methods are computationally prohibitive (Barison et al., 1 Nov 2024, Liepuoniute et al., 7 Nov 2024).
Materials science: Quantum-centric architectures are mapped to prototypical active-space calculations, use case-driven embedding techniques (e.g., DMET+MPS), Trotterized Hamiltonian evolution, and linear combination of unitaries methods (Alexeev et al., 2023). Approximate quantum-classical pipelines are coupled with error mitigation strategies (e.g., ZNE, PEC, classical shadows) and domain-informed compiler optimizations (e.g., operator grouping, SWAP network optimization) to address classically hard correlated-electron problems beyond FCI capabilities.
Physics and high-energy applications: Quantum simulation of fermionic scattering, SU(3) lattice Yang–Mills theory, quantum machine learning for detector analysis, and quantum linear systems solvers (HHL algorithm) are all considered within the quantum-centric framework, with estimates indicating that up to $O(10^9)$ gates and order-10-14 qubits will be required for future fault-tolerant advantage in strongly-coupled use cases (Pascuzzi et al., 21 Aug 2024).

5. Challenges, Limitations, and Solutions

Several practical and theoretical challenges are highlighted:

Exponential memory/circuit depth scaling: Despite advanced encoding and decomposition, the brute-force simulation of large universal circuits runs into hard RAM and communication bottlenecks. Trade-offs exist between memory usage, numerical precision, and computation time, with gate decomposition and tensor slicing balancing these to push boundaries (Willsch et al., 2019, Yong et al., 2021, Wang et al., 20 Jun 2024).
Noisy hardware and measurement runtime: Pre-fault-tolerant quantum devices cannot fully solve chemistry or physics use cases in isolation due to prohibitive circuit depth and measurement statistics, necessitating classical offload of postprocessing and error mitigation. Hybrid estimators and configuration recovery partially suppress the impact of noise, producing classically certifiable energy bounds (Robledo-Moreno et al., 8 May 2024).
Operating system and orchestration: Achieving reliability at the full-system level (including OS, workload manager, and message-passing layers) is necessary for long-running error-corrected computations. Microkernel isolation, message-passing architectures, and replication add robustness but increase complexity and resource requirements (Paler, 17 Oct 2024).
Latency and I/O bottlenecks: Tightly coupled hybrid algorithms are sensitive to communication overhead, especially for real-time feedback and error correction. Specialized hardware (e.g., ultra-fast interconnects), careful job orchestration, and possibly local, embedded processing for critically latency-bound components are required (Paler, 17 Oct 2024, Lee et al., 2023).

6. Outlook and Future Directions

The quantum-centric supercomputing paradigm is projected to evolve along the following vectors:

Integration of quantum and HPC workflows in unified data centers: Quantum-centric data centers are anticipated, where QPUs are interfaced with classical nodes at low latencies, supporting true real-time hybrid feedback loops for algorithms such as VQE, QML, and Hamiltonian simulation (Alexeev et al., 2023, Pascuzzi et al., 21 Aug 2024).
Advances in error correction and mitigation: Surface code, LDPC codes (requiring higher-than-2D connectivity), modular architectures, and circuit knitting methods (e.g., partitioned execution, entanglement forging) are being developed to enable practical scaling and reduce overhead (Bravyi et al., 2022).
Layered middleware, serverless quantum runtime environments, and dynamic circuit execution: These approaches are being deployed to manage arbitrary hardware topologies, dynamic allocation, and hardware/software abstraction, aiming to make quantum processing ubiquitous and frictionless (Bravyi et al., 2022, Alexeev et al., 2023).
Benchmarks, validation, and measurement: Rigorous hybrid benchmarks (e.g., cross-entropy benchmarking, heavy output generation, SupermarQ synthetic workloads) and competitive evaluation metrics are converging to quantify quantum advantage and guide further development (Villalonga et al., 2019, Shehata et al., 15 Aug 2024).
Expanding to real scientific and industrial applications: As the hardware and software stack matures, quantum-centric supercomputers will target high-value electronic structure, correlated materials, optimization, and machine learning tasks previously inaccessible to purely classical or quantum resources.

This evolution is underpinned by collaborative innovation in hardware architecture, programming languages, distributed resource management, error suppression/mitigation, and hybrid workflow construction across the quantum and HPC research communities.