Papers
Topics
Authors
Recent
Search
2000 character limit reached

Heterogeneous architectures enable a 138x reduction in physical qubit requirements for fault-tolerant quantum computing under detailed accounting

Published 7 Apr 2026 in quant-ph | (2604.06319v1)

Abstract: Quantum computer hardware is predicted to scale over hundreds of thousands of qubits coming online in the next decade. Despite significant theoretical and experimental QEC progress, quantum computer architecture has suffered a significant gap, with bottom-up physical-device-driven challenges largely disconnected from top-down QEC-code-driven considerations. In this work, we unify these two views, presenting a complete heterogeneous quantum computing architecture incorporating task-specific hardware selection and QEC encoding, and agnostic to code selection or physical qubit parameters. Our approach further enables special-purpose processing modules, and includes a full microarchitecture for fault-tolerant implementation of interfaces between quantum processing units and quantum memories. Using this architecture and a new fully featured compiler functioning across subsystems at the scale of $1,000$ logical qubits, we schedule and orchestrate a variety of algorithms down to hardware-specific instructions; a detailed accounting of all operations reveals up to 551x reduction in algorithmic logical error and up to 138x reduction in physical-qubit overhead compared to a monolithic baseline architecture. We then consider the factorization of 2048-bit RSA-integers; using an experimentally demonstrated grid-coupling topology, factoring RSA-2048 requires 381k physical qubits and 9.2 days, which can be reduced to 4.9 days via addition of an algorithm-specific accelerator for the Adder subroutine (requiring 439k qubits). Finally, assuming hypothetical long-range coupling, implementing quantum memory using qLDPC codes reduces the resources required for factoring to just 190k qubits and under 10 days. These results and the tooling we have built indicate that heterogeneous quantum-computer architectures can deliver significant, verifiable benefits on realistic hardware.

Summary

  • The paper introduces Q-NEXUS, demonstrating up to a 138x reduction in physical qubits through decomposing the system into specialized modules.
  • It leverages the Q-CHESS compiler for error- and resource-aware mapping of quantum algorithms onto heterogeneous hardware under realistic constraints.
  • The approach lowers logical error rates and resource demands in practical applications, including RSA-2048 factorization, via optimized module-specific scheduling.

Heterogeneous Quantum Architectures Achieve 138x Reduction in Physical Qubit Overhead: An Expert Analysis of Q-NEXUS

Introduction

This work addresses the central challenge in scaling fault-tolerant quantum computing: the physical and control resource explosion inherent in monolithic, error-corrected architectures. The paper proposes a rigorous, full-stack heterogeneous architectural framework—Q-NEXUS—paired with a microarchitecture-aware compiler (Q-CHESS), targeting optimal mapping of error-corrected quantum algorithms onto physically feasible systems. The key innovation is a principled separation of hardware into functionally distinct modules for processing, memory, communication, and resource state generation, combined with flexible support for heterogeneous qubit modalities and QEC codes. The authors carry out detailed, module-resolved resource accounting for representative quantum algorithms, including complete compilation and scheduling at the 1,000-logical-qubit scale, and the classically intractable problem of RSA-2048 integer factorization.

Motivation and Prior Art

Monolithic quantum computing architectures, in which computation and storage functionalities are co-located on a regular qubit lattice, encounter overwhelming growth in wiring, control density, and cross-talk at scale. The classical analogy is the "tyranny of numbers" in pre-IC electronics, now manifesting in quantum systems at kilocore scales. While prior modular (multi-core, code-first) and device specialization works have tackled discrete subsystems (e.g., QEC-driven modularity, hardware patchworks, or memory-compute separation), none cohesively integrate memory hierarchy, qubit/code heterogeneity, explicit bus-mediated communication, and a compilation pipeline mapping full algorithms to instruction schedules under actual device constraints.

Architectural Framework: Q-NEXUS Design

The Q-NEXUS architecture formalizes the quantum computer as an interconnected network of:

  1. Small, fixed-size Quantum Processing Units (QPUs)—optimized for speed and universality, limited in logical qubit count to mitigate crosstalk and fabrication complexity.
  2. Quantum State Factories (QSFs)—dedicated magic/non-Clifford resource state distillation, physically decoupled from QPU.
  3. Quantum Memory (QM)—segmented into Random-Access QM (RAQM) and Static Transversal QM (STQM), allowing independent code/modality and clock cycle parameters, and providing either active QEC for long-term storage or passive ultra-long-coherence for cache-like storage.
  4. Application-Specific Quantum Processing Units (ASQPUs)—for efficient execution of algorithmic bottlenecks such as addition in Shor’s algorithm.
  5. Quantum Bus (QB)—a photonic/microwave interconnect network supporting code/geometry-mismatched, fault-tolerant logical state transfer via transversal teleportation (STQM) or lattice surgery (RAQM). Figure 1

    Figure 1: High-level diagram of Q-NEXUS, showing distinct modules—QPU, QSF, QM, ASQPU—connected by an optical quantum bus and orchestrated by microarchitecture-aware control.

This decomposition enables mapping each functional block to the optimal available qubit modality/technology, and tunes QEC code selection for only the needed logical properties per module (e.g., surface code in compute, qLDPC for dense memory, etc.).

Module Interfaces and State Transfer Protocols

Q-NEXUS introduces explicit, code-aware protocols mediating transfer and synchronization between modules:

  • Transversal Teleportation (QPU ↔ STQM): High-bandwidth, minimal-latency, but requires matching codes and patch sizes, making it appropriate only for short-term buffer memory.
  • Lattice Surgery (QPU ↔ RAQM): General, handles code conversion, enables traffic between modules of different code families/distances; incurs higher latency due to sequential syndrome extraction cycles. Figure 2

    Figure 2: Schematic of transversal teleportation and lattice-surgery transfer protocols for state movement between QPU and quantum memories with varying code distances and correction cycles.

The architecture supports scalable allocation of transfer patches and paths within bus-connected modules, managing physical design constraints (swap distance, wiring) while maximizing logical qubit density.

Compiler Stack: Q-CHESS

Classical quantum compilers are inadequate for heterogeneous systems, as they assume uniform device graphs and operation costs. Q-CHESS is introduced as a control stack that executes error- and resource-aware compilation, mapping high-level circuits to machine instructions, incorporating:

  • Logical depth reduction and blockification,
  • Register allocation aware of module specialization and movement cost,
  • Dynamic routing and idle-error minimization with architectural cost function (deciding when/where to store/transfer logical states based on combined idling and transfer error),
  • Synchronization of module clocks and insertion of buffer windows to optimally mask memory access or transfer latency,
  • End-to-end resource accounting (qubits, couplers, operation counts, error decomposition). Figure 3

    Figure 3: The Q-CHESS compilation pipeline: logical and physical transformation stages, module-aware code generation, and hardware-mapping for all major architectural modules.

Quantitative Resource Analysis: Mid-Scale Algorithms

Extensive simulations are conducted for 1,000-logical-qubit circuits on AQFT, quantum addition, Fermi-Hubbard dynamics, etc. Key metrics compared with vanilla monolithic, surface-code, grid-superconducting devices:

  • Error Rate: Heterogeneous architectures achieve 42–59x reductions in logical error (AQFT), reaching up to 551x for Adder, by offloading idle periods to error-suppressed storage modules. This suppresses runtime-proportional error accumulation by decoupling logical failure probability from clock time.
  • Physical Qubit Requirement: Memory offload and code specialization yield 60x (STQM) and 138x (RAQM) reductions in total physical qubits needed for a fixed logical qubit count.
  • Bandwidth, Latency, and Execution Time: Optimal scheduling via Q-CHESS maintains nearly parity (increase <2x) in total runtime, despite data movement and cycle mismatch overheads, up to memory–QPU cycle time ratios of 1,000. Figure 4

    Figure 4: Fidelity improvement, circuit duration, and total qubit reduction for three algorithms as a function of logical qubit count, demonstrating superlinear improvements with scale.

    Figure 5

    Figure 5: Comparison of execution errors and durations for interface protocols. Even with aggressive clock mismatch, resource and fidelity gains persist above ∼20 logical qubits.

    Figure 6

    Figure 6: Error budget breakdown for 1,000-LQ AQFT—idling dominates homogeneous baseline, replaced by state-transfer (moderate mismatch) and non-Clifford gate errors (low mismatch) in Q-NEXUS.

RSA-2048 Factoring: Scaling to Practical Cryptographic Relevance

Compiling, scheduling, and hardware mapping of the full circuit for RSA-2048 factorization—factoring a 2,048-bit modulus—translates the mid-scale results to cryptanalytic workloads. A decomposition into operational subroutines (Adder, Lookup, Phaseup) and exact module scheduling yields:

  • Monolithic Surface Code Baseline (State-of-the-Art): 0.9 million physical qubits (fully active QEC), 1.8M couplers, 5 days runtime.
  • Q-NEXUS Hierarchical Architectures:
    • (B1) STQM only: 1.04M qubits, half static, 9.2 days,
    • (B2) STQM + Surface RAQM: 0.38M qubits, 9.2 days,
    • (B3) STQM + Gross qLDPC RAQM: 0.19M qubits, 9.2 days,
    • (B4–B6) Application-Specific Adder Acceleration: halves runtime to 4.9 days, with modest qubit increases.
  • Pareto Optimality: B6 (STQM + qLDPC RAQM + Adder accelerator) strictly dominates on qubit count, coupler count, and total space-time cost for all resource metrics.
  • Control Complexity: The number of highly connected, actively error-corrected qubits (the hardest engineering component) drops to 0.14M—an order-of-magnitude improvement. Figure 7

    Figure 7: Normalized performance spiderplots across architectures, axes show resource and time metrics; Q-NEXUS variants envelop the monolithic baseline.

Implications and Theoretical Insights

  • Idling Becomes Irrelevant: The dominant error source in monolithic platforms—idling during circuit serialization and parallelization bottlenecks—is suppressed by offloading to memory tiers. This leads to logical error scaling governed by module transfer and non-Clifford gate synthesis only.
  • Code Specialization and Functional Heterogeneity: The decoupling of code and qubit requirements across modules leverages the strengths of each technology: surface code for QPU, high-rate qLDPC or ULC for storage. This enables physical-constraint-tuned deployments unattainable with code-uniform architectures.
  • Bus-Enabled Architectural Optima: The quantum bus (QB) delivers a crucial all-to-all logical interconnect, allowing massive savings in local wiring and enabling nonlocal traffic, analogous to modern CPU-memory interconnects. The architecture’s viability is predicated on continued advances in photonic/microwave coupling, transduction, and Bell-pair production.

Future Directions

  • Integration with Emerging Modalities: Improved qLDPC implementations, photonic logical state transfer, and hybrid classical/quantum bus development will further compress hardware resource requirements and reduce interface error rates.
  • Architectural/Algorithmic Co-Design: The Q-CHESS pipeline provides a platform for co-optimizing task scheduling, hardware organization, and circuit synthesis, opening new avenues for parallelization and cost reduction unmatched by previous approaches.
  • Module Interface Specification: The delineation of well-abstracted module boundaries may facilitate adoption of architectural/hardware standards, unlocking third-party device, code, and algorithmic specialization.

Conclusion

The paper demonstrates via explicit, module-resolved resource accounting and machine-level compilation that quantum architectures embracing functional specialization and heterogeneity—embodied in Q-NEXUS—achieve up to 138x reduction in physical qubit requirements for scalable, fault-tolerant quantum computing. This is achieved under realistic device and interconnect constraints, and with minimal runtime overheads, by rigorously suppressing idling, employing code heterogeneity, and offloading long-lived quantum data to high-density, physically optimized memories. The result establishes architectural design, rather than raw device scaling or QEC innovation alone, as the primary driver of practical large-scale quantum computation.


References:

  • "Heterogeneous architectures enable a 138x reduction in physical qubit requirements for fault-tolerant quantum computing under detailed accounting" (2604.06319)

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Explain it Like I'm 14

Overview: What this paper is about

This paper is about building big, reliable quantum computers without needing an impossible number of parts. The authors show a new “heterogeneous” design—meaning different parts of the machine are built for different jobs—that can make quantum computers much more practical. Using this approach, they report up to 138 times fewer physical qubits needed and up to 551 times fewer errors for important tasks, compared to a traditional one-size-fits-all design.

Think of it like a modern computer: you don’t use the same chip to do everything. You have a CPU for general work, a GPU for graphics, memory to store stuff, and a fast connection between them. The authors bring this idea to quantum computers and show it really helps.

The big questions they asked

The paper asks:

  • Can we design a quantum computer where different parts specialize in different tasks (computing, storing, and moving quantum information) to save space and reduce errors?
  • Can we actually schedule and run large programs using such a design, and count every operation honestly, to see if it really works better?
  • How much could this help on hard problems like factoring big numbers (such as RSA-2048), which is a major test for quantum computing?

How they approached it (in everyday terms)

The authors designed a full system and built a detailed compiler to run it end-to-end. Here’s the idea with simple analogies:

  • Quantum bits (qubits) are extremely fragile. To keep them safe, we use “logical qubits” made from many “physical qubits,” protected by quantum error correction—like wrapping a delicate object in many layers of bubble wrap.
  • In many algorithms, most qubits spend most of their time waiting around (idling). Keeping all of them in an expensive, high-speed “work area” is wasteful.

So they split the machine into specialized parts:

  • Quantum Processing Units (QPU): The “workbench.” Small, fast, and designed to perform operations quickly and reliably.
  • Magic State Factories (QSF): Think of these like special ingredient kitchens that prepare rare “magic states.” These are needed to perform certain advanced quantum operations. Making them is tricky, so a dedicated unit does it efficiently.
  • Application-Specific Accelerators (ASQPU): Like a calculator app built into the system for a subroutine you use a lot (for example, a super-fast adder). It’s not a full general-purpose tool, but it does its one job extremely well.
  • Quantum Memory (QM): The “storage room.” A place to keep qubits safe when they’re not being used.
    • Static memory (STQM): A short-term, ultra-quiet “shelf” where you can leave a qubit briefly without constantly checking it, like a fast cache.
    • Random-access memory (RAQM): Long-term storage with error correction, like RAM that you can access uniformly from anywhere.
  • Quantum Bus (QB): The “hallway and conveyor belts” connecting everything. It uses teleportation-like methods (via shared entangled pairs called Bell pairs) to move quantum states safely between units without walking them step-by-step through the crowd.

To make this work in detail, they built:

  • Q-NEXUS: The overall hardware architecture (the blueprint of all the parts above).
  • Q-CHESS: A compiler that acts like the “project manager.” It schedules every instruction, handles different clock speeds (some parts are faster than others), coordinates where qubits move, and ensures all the tiny details are counted. This is key for “detailed accounting,” so the final numbers are realistic.

What they found and why it’s important

Using their architecture and compiler, they tested common building blocks and a full-size challenge. Here are the highlights:

  • Big reductions in resources and errors:
    • For the Quantum Fourier Transform (QFT), a key subroutine in many algorithms: 42–59× fewer logical errors and 60–138× fewer physical qubits compared to a traditional “one big block” design.
    • For other tasks like an Adder and simulating physics (the Fermi–Hubbard model): more than 100× fewer logical errors.
    • Across cases, up to 551× fewer algorithmic logical errors were observed.
  • Realistic path to factoring RSA-2048:
    • Using an experimentally demonstrated “grid” coupling (no fantasy long-range wires) and two QPUs:
    • About 381,000 physical qubits
    • About 9.2 days of runtime
    • Adding a small, specialized Adder accelerator (37 logical qubits) speeds it up:
    • About 439,000 physical qubits
    • About 4.9 days of runtime
    • If future hardware supports long-range connections and uses high-density memory codes (qLDPC) for storage:
    • About 190,000 physical qubits
    • Under 10 days

Why this matters:

  • These are “bottom-up” resource counts that include real-world costs: moving data, waiting times, different module speeds, and making magic states. It’s not just theory; it’s a blueprint that can fit actual hardware limits like wiring and control electronics.
  • It shows that separating “where you compute” from “where you store” works very well—especially because many qubits are idle most of the time (around 96–97% in key versions of Shor’s algorithm). Moving them to memory saves space and reduces errors.

What this could mean going forward

  • A smarter way to build quantum computers: Instead of trying to make one giant chip that does everything, we can build different modules—each optimized for its job—and connect them. This approach mirrors how modern classical computers evolved (CPU, GPU, RAM, caches, fast interconnects).
  • Practical scaling: By storing most qubits in specialized memory and keeping the fast QPU small, we avoid the “tyranny of numbers” problem where wiring, cooling, and control lines become overwhelming.
  • Faster progress: The architecture supports mixing different qubit types and different error-correcting codes in the same system. That means labs and companies can pick the best technology for each module and still make it all work together.
  • Better planning: Because the authors’ compiler (Q-CHESS) accounts for timing, routing, and all the hidden costs, the estimates can guide real hardware development and help decide which improvements matter most.

In short, the paper shows a practical, modular, and well-measured path to large-scale, fault-tolerant quantum computing—one that uses fewer resources, makes fewer mistakes, and fits better with how actual hardware can be built.

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Below is a concise list of unresolved issues and specific open questions that, if addressed, would strengthen or extend the paper’s contributions.

  • Experimental validation: No hardware prototype or end-to-end experimental demonstration of Q-NEXUS/Q-CHESS; identify a minimal viable subset (e.g., single QPU + STQM + bus link) for empirical validation.
  • Sensitivity to device parameters: Lack of systematic sensitivity analyses to key assumptions (gate/measurement error rates, cycle times, bus link loss/fidelity, memory T1/T2); quantify thresholds at which the reported 42–551× error and 60–138× qubit savings persist or break down.
  • Quantum bus realism: Unspecified bus topology, switching fabric, channel count, and scaling (e.g., optical cross-connect capacity, insertion loss, fanout); provide loss budgets and link budgets that reconcile required Bell-pair rates with realistic photonic/microwave hardware.
  • Bus throughput and congestion: No queueing or contention analysis for simultaneous transfers; develop arbitration, scheduling, buffering, and QoS policies and assess their impact on latency and logical error.
  • Fault-tolerant transfer protocols (FTTP) detail: Missing fully specified circuits, ancilla counts, and time/space costs for state transfer (including repeated purification and retries); provide failure modes and end-to-end error-budget allocation.
  • Code conversion across modules: No concrete, fault-tolerant code-switching protocols (surface/bicycle ↔ qLDPC) with proven correctness, latency, and resource overhead; clarify required intermediate encodings and decoder handoffs.
  • Decoder integration and scaling: Absent analysis of decoder types, throughput, and latency across heterogeneous codes; quantify classical compute load and its effect on cycle-time alignment and program stalls.
  • Noise model realism: Assumes simplified error models; evaluate impact of leakage, non-Pauli, correlated, time-varying, and crosstalk errors on distillation, purification, code thresholds, and transfer protocols.
  • Runtime variance and worst-case bounds: No characterization of stochastic runtime due to probabilistic purification/teleportation; provide distributions (not just means) and tail bounds relevant to SLA-like guarantees.
  • Magic-state supply/demand engineering: Underspecified factory sizing, placement, and distribution policies under time-varying demand; analyze pipeline fill/flush times, backlog management, and the runtime impact of supply droughts.
  • Static memory (STQM) operating envelope: Unclear maximum safe dwell times without active QEC, error accumulation models (including drift/leakage), and transition protocols in/out of STQM; define calibration and refresh strategies.
  • Random-access memory (RAQM) layout and scaling: No quantitative study of SWAP-distance growth, hotspotting, and fragmentation as memory size scales; design placement/routing strategies that minimize average and worst-case access cost.
  • qLDPC RAQM feasibility: The “hypothetical long-range coupling” assumption is not supported by a concrete implementation roadmap; compare alternatives using only local connectivity and quantify overhead penalties.
  • qLDPC decoder practicality: Missing evaluation of specific qLDPC decoders (latency, accuracy, hardware cost) and their compatibility with real-time scheduling constraints.
  • Timing mismatches and buffering: While acknowledged, there is no explicit policy evaluation for buffering, synchronization, and back-pressure across modules with disparate clocks; quantify stall rates and their effect on logical error accumulation.
  • Compiler optimality and scalability: Q-CHESS scheduling heuristics and compile-time complexity are unspecified; provide approximation guarantees or empirical optimality gaps, and runtime scaling beyond ~1,000 logical qubits.
  • Cache/hierarchy policies: No formal heuristics for when to keep states on QPU vs STQM vs RAQM under uncertain future use and variable transfer latency; evaluate prefetching, eviction, and adaptive policies.
  • Multi-core QPU scaling: Limited guidance on load balancing, inter-core communication costs, and when multi-core provides net benefits given bus contention and factory sharing.
  • Application-specific accelerators (ASQPU) methodology: Lacks a general framework to identify high-ROI subroutines, co-design their micro-architectures, and validate portability across algorithms; define interface standards and verification practices.
  • Benchmark breadth and generality: Results focus on QFT, Adder, Fermi–Hubbard, and Shor; assess broader workloads (e.g., qubitization-based chemistry, error-corrected VQE, amplitude amplification, QAOA variants) to test architectural generality.
  • Baseline fairness: Clarify whether the monolithic baseline includes comparable micro-architectural optimizations (e.g., pipelining, buffering) and realistic routing; provide ablation studies isolating each heterogeneous feature’s contribution.
  • Error-budget decomposition: Provide per-component contributions (QPU ops, idling, bus transfers, purification, code conversion, RAQM routing, factory errors) to make bottlenecks and trade-offs transparent.
  • Physical integration constraints: Absent packaging and cryogenic/thermal analyses for heterogeneous modalities, including photonic feedthroughs, laser delivery, and microwave shielding in large systems.
  • Non-qubit resource accounting: No estimates for control/readout wiring, lasers, cryo load, optical components, and classical compute power; include these to validate “tyranny of numbers” mitigation claims.
  • Reliability and yield: No discussion of module-level fault isolation, redundancy, and reconfiguration in the presence of defective qubits, dead links, or failed decoders; propose yield models and re-mapping strategies.
  • Parallelization of Shor’s outer loop: The work fixes a sequential approach; analyze space–time trade-offs of parallelization within Q-NEXUS (impacts on memory footprint, bus throughput, and factory demand).
  • Measurement and feedback latencies: Quantify the effect of disparate measurement speeds and classical feedback paths on pipeline depth and synchronization across modules.
  • Scheduling under uncertainty: Develop strategies that adapt at runtime to fluctuating link quality, factory output rates, and decoder backlog while bounding logical error and total time-to-solution.
  • Standardized IR and machine ISA: Specify a formal intermediate representation and machine-level instruction set for heterogeneous modules to enable reproducibility and third-party toolchains.
  • Security and fault containment: Define how faults or misconfigurations in one module (e.g., a noisy bus segment) are detected, isolated, and prevented from corrupting other subsystems’ logical states.

Practical Applications

Overview

This paper introduces Q-NEXUS, a heterogeneous quantum-computing architecture that cleanly separates computation, communication, and storage, and Q-CHESS, a machine-level, microarchitecture-aware compiler that schedules fault-tolerant programs across modules operating at different clock rates. The work demonstrates large, verifiable gains from architectural heterogeneity and detailed scheduling: up to 551× reduction in logical error and up to 138× reduction in physical-qubit overhead on key subroutines (QFT, Adder, Fermi–Hubbard dynamics), and end-to-end resource estimates for factoring RSA-2048 (e.g., 381k physical qubits and 9.2 days with an experimentally demonstrated grid topology; 4.9 days with an Adder accelerator at 439k qubits; 190k qubits and <10 days assuming qLDPC memory with long-range coupling). Below are actionable applications derived from the architecture, compiler, and analysis.

Immediate Applications

The following can be pursued now with current tools, existing experimental capabilities, and for planning, procurement, and risk management.

  • Architecture-driven resource planning and roadmapping (Industry—hardware vendors, cloud providers)
    • Use Q-NEXUS’s modular blueprint (QPU, QSF, ASQPU, Quantum Bus, RAQM/STQM memory tiers) to guide chiplet design, interconnect R&D, and scaling plans; leverage the paper’s detailed accounting to set performance targets (e.g., bus fidelity, memory access latency, distillation throughput).
    • Dependencies: credible device parameters; access to interconnect prototypes; coordination between hardware, control, and compiler teams.
  • Machine-level heterogeneous compilation for realistic resourcing (Academia/Industry—software tooling, benchmarking)
    • Adopt or replicate Q-CHESS-style compilers to produce machine instructions and schedules that include routing, buffering, and timing mismatches across modules; use on QFT, arithmetic, and simulation benchmarks to quantify true costs and identify bottlenecks.
    • Dependencies: availability/integration of a Q-CHESS-like toolchain; mapping to target hardware’s native gates, cycle times, and connectivity.
  • Algorithm-specific accelerator design workflow (Industry/Academia—hardware–software co-design)
    • Prototype ASQPU modules (e.g., Adder, QFT) and measure space–time tradeoffs; integrate with compilers to place high-frequency subroutines on accelerators for performance gains (as shown for RSA’s Adder).
    • Dependencies: physical integration with interconnect; validated accelerator microcode; compatibility with the system’s error-correction stack.
  • Magic-state factory (QSF) sizing, placement, and runtime scheduling (Industry—superconducting, trapped-ion labs)
    • Plan dedicated distillation pipelines, buffer depths, and connectivity to QPUs to avoid T-state starvation; apply compiler-based rate matching of QSF output to algorithm demand.
    • Dependencies: measured distillation yields and cycle times; classical control latency; calibrated transfer paths to/from QPUs.
  • Interconnect and quantum bus protocol testing (Industry—photonics/microwave interconnects; Labs)
    • Implement and benchmark teleportation-based, fault-tolerant transfer microarchitectures and Bell-pair purification at small scale to validate quantum-bus design choices and error budgets.
    • Dependencies: demonstrated entanglement distribution/purification; synchronization electronics; error tracking of transfer protocols.
  • Standards-ready metrics and procurement specifications (Policy/Government/Industry consortia)
    • Translate detailed accounting into KPIs for RFPs: bus yield/fidelity, memory access latency, logical-cycle timings, distillation throughput, code-conversion costs; align funding calls and milestones to architectural metrics rather than qubit counts alone.
    • Dependencies: stakeholder consensus; participation in standards bodies; repeatable metrology.
  • Cryptographic risk assessment and PQC migration prioritization (Finance/Cybersecurity/Policy)
    • Use the RSA-2048 resource–time projections (9.2 days with 381k qubits under grid coupling assumptions; faster with accelerators) as scenario inputs for enterprise and national PQC timelines and for updating threat models and compliance roadmaps.
    • Dependencies: acceptance that timelines depend on engineering assumptions; continuous tracking of hardware progress and code advances.
  • Cloud and datacenter capacity planning for quantum workloads (Cloud providers/HPC centers)
    • Build scheduling and capacity models using the paper’s timing-aware orchestration (e.g., memory hierarchy, multi-core QPU, routing latencies) to plan module counts, queues, and service-level objectives.
    • Dependencies: integration with classical orchestration systems; realistic hardware availability and utilization data.
  • Benchmarking suites and open datasets for heterogeneous FT execution (Academia/Industry)
    • Package QFT, Adder, and simulation workloads with machine-level schedules to benchmark heterogeneous stacks and validate end-to-end improvements versus monolithic baselines.
    • Dependencies: public toolchains and reproducible parameter sets; agreement on benchmark rules and reporting.
  • Education and workforce training on stored-program quantum architectures (Academia/Training providers)
    • Use Q-NEXUS and Q-CHESS as concrete teaching tools for memory hierarchies, interconnect protocols, and microarchitectural scheduling in FT quantum systems.
    • Dependencies: course materials and lab exercises; access to simulators or small modular testbeds.

Long-Term Applications

These require further research, scaling, or development of hardware, codes, interconnects, and control systems.

  • Heterogeneous quantum data centers implementing Q-NEXUS (Industry—hardware vendors, cloud providers)
    • Deploy multi-module systems with fixed-size QPUs, high-throughput QSFs, ASQPUs, a fault-tolerant quantum bus, and hierarchical RAQM/STQM memory tiers for large-scale workloads.
    • Dependencies: robust high-fidelity interconnects; synchronized control stacks; validated error-correction pipelines at scale.
  • RSA-2048 (and beyond) factoring capability (Policy/Defense/Finance)
    • Realize the paper’s end-to-end factoring scenarios with grid-coupled or long-range-coupled qLDPC memory, updating national cryptographic policies, incident response plans, and deprecation timelines for legacy crypto.
    • Dependencies: achieving target physical error rates and logical cycle times; scalable magic-state supply; reliable bus and code-conversion.
  • Quantum memory products as modular subsystems (Industry—quantum memory vendors)
    • Commercialize RAQM (active QEC, random access) and STQM (static cache-like stores) modules with standardized interfaces to QPUs and buses; offer capacity-driven scaling of quantum data retention.
    • Dependencies: memory modalities with ultra-long coherence, practical access latencies, and high-rate codes; efficient decoders; thermal and control integration.
  • Application-specific quantum accelerators (ASQPU portfolio) (Industry—vertical solutions)
    • Build accelerator lines for QFT/phase estimation, arithmetic (adders/multipliers), chemistry oracles, and T-factory-dense subroutines to reduce runtime and qubit overhead on target verticals (e.g., chemistry, cryptography).
    • Dependencies: tight compiler integration; validation of speedups under FT constraints; interoperability over the bus.
  • Fault-tolerant quantum bus networks with all-to-all logical connectivity (Industry—photonics/microwave vendors)
    • Create rack-scale interconnect fabrics with entanglement distribution, purification, and switching to connect many modules at high fidelity and throughput.
    • Dependencies: scalable sources/detectors/transducers; low-loss channels; automated calibration; networked error tracking and recovery.
  • Multi-code execution with code conversion (Academia/Industry—codes and decoding)
    • Operate surface/bicycle codes in QPUs and high-rate qLDPC codes in memory, with efficient logical code conversion over the bus to maximize density and performance.
    • Dependencies: low-overhead, high-fidelity logical code conversion; fast decoders for qLDPC at scale; error-model validation across codes.
  • Quantum operating systems for heterogeneous FT orchestration (Software/Cloud)
    • Develop runtime services for scheduling, buffering, memory placement (RAQM vs STQM vs QPU), distillation flow control, and transfer optimization across modules with disparate clocks.
    • Dependencies: standardized machine-level instruction sets; telemetry and health monitoring; formal verification of safety/liveness properties.
  • Industrial-scale quantum simulation for materials and energy (Energy/Chemistry/Manufacturing)
    • Use the demonstrated reductions in logical error and qubit counts for core subroutines (e.g., QFT, Trotterization) to reach practically relevant simulations (catalysts, battery materials, superconductors).
    • Dependencies: FT hardware scale; domain-specific algorithm/compiler co-design; validated Hamiltonian models and error mitigation within FT.
  • Finance and logistics optimization under FT constraints (Finance/Transportation)
    • Execute resource-intensive quantum algorithms (phase estimation, amplitude estimation, oracles) more efficiently via accelerators and memory hierarchies to improve solution quality or runtime.
    • Dependencies: mature FT stacks; proven advantage for specific instances; integration with classical data pipelines.
  • Standards and regulatory frameworks for modular FT systems (Policy/Standards bodies)
    • Establish inter-module protocol standards (interfaces, code-conversion semantics, instruction encodings), performance certification, and safety/compliance baselines for quantum data centers.
    • Dependencies: industry alignment; interoperable reference implementations; certification authorities and test suites.
  • Cross-modality supply chains and manufacturing (Industry—semiconductor/photonics/control)
    • Build coordinated ecosystems for QPU chips, memory substrates, photonic/microwave interconnects, cryo/control electronics, and calibration software, aligned to heterogenous architectures.
    • Dependencies: stable vendor interfaces, IP/licensing frameworks, and long-term component reliability.
  • Hybrid quantum–classical HPC integration (HPC centers)
    • Co-locate modular quantum systems with exascale HPC to manage pre/post-processing, decoding, and classical feedback loops with low latency and high throughput.
    • Dependencies: high-bandwidth classical links, decoder acceleration (e.g., GPUs/FPGAs), co-scheduling with HPC job managers.

Key Assumptions and Dependencies Across Applications

  • Physical-layer performance: target gate/measurement error rates and logical cycle times as assumed in the paper’s tables; stability over long runs.
  • Interconnects: availability of high-quality Bell-pair generation and purification; reliable photonic/microwave links; low-latency synchronization.
  • Codes and decoding: practical, high-rate qLDPC codes (for RAQM) with fast decoders; efficient code-conversion protocols with bounded overhead.
  • Control stack: deterministic, low-jitter orchestration across modules with disparate clocks; telemetry for fault detection and recovery.
  • Compiler/tooling: access to Q-CHESS-like machine-level schedulers; correctness and performance validation on heterogeneous targets.
  • Engineering scale-up: cryogenic capacity, wiring, calibration automation, and yield management compatible with bounded-size QPUs and large memory tiers.
  • Ecosystem alignment: standards for module interfaces and instruction sets; supply-chain maturity for cross-modality components.

Glossary

  • Adder: An arithmetic subroutine that adds two numbers within a quantum algorithm, often a performance bottleneck in Shor-like circuits. "via addition of an algorithm-specific accelerator for the Adder subroutine (requiring 439k qubits)."
  • algorithmic logical error: The effective error rate of a quantum algorithm at the logical level after error correction and scheduling are considered. "up to 551×551\times reduction in algorithmic logical error"
  • all-to-all logical connectivity: A connectivity model where any logical qubit can interact with any other via the interconnect, reducing routing overhead. "The interconnect bus mediates all-to-all logical connectivity via optical connections (black lines)"
  • application-specific quantum processing unit (ASQPU): A specialized processor that efficiently implements a restricted, frequently used set of quantum operations to accelerate a target algorithm. "The architecture should support APpLication-specific quantum processing units (ASQPU) that apply specialized logic operations to quantum data -- those which occur in a target algorithm with high frequency -- when they reduce execution time or resource overhead."
  • Bell-pair generation: The creation of entangled qubit pairs used as a resource for teleportation-based communication. "where Bell-pair generation enables teleportation-based state transfer."
  • Bell-pair purification: A protocol that boosts the fidelity of entangled pairs by combining multiple noisy pairs to distill higher-quality entanglement. "Using an experimentally demonstrated Bell-pair purification scheme"
  • bivariate bicycle codes: A family of quantum LDPC codes with favorable properties for implementing logical operations in certain architectures. "surface-codes \cite{Litinski2019} and bivariate bicycle codes \cite{Yoder2025}"
  • CCZ-state: A multi-qubit magic resource state enabling implementation of the non-Clifford controlled-controlled-Z (CCZ) gate in fault-tolerant computation. "delays in CCZ-state supply"
  • code conversion: The process of reliably transforming a logical state from one quantum error-correcting code to another. "the quantum bus should therefore support code conversion"
  • code distance: A parameter of a quantum error-correcting code that determines how many physical errors can be detected/corrected and influences logical error rates. "reporting outcomes in terms of physical qubit numbers, code distances, and asymptotic error scaling."
  • code-first: An approach where architectural choices are primarily driven by properties and assumptions of a single error-correcting code. "the key structural and analytic ``code-first'' approaches have been maintained"
  • dilution refrigerator: An ultra-low-temperature cryostat used to operate certain qubit technologies (e.g., superconducting qubits). "a single dilution refrigerator"
  • fault-tolerant transfer protocols: Procedures that move quantum information between modules while preserving encoded fault-tolerant properties. "with a MiCRo-architecture that includes fault-tolerant transfer protocols and resource generation."
  • Fermi–Hubbard model: A fundamental model in condensed matter physics used to study interacting electrons on a lattice, often targeted by quantum simulation algorithms. "For dynamic simulations of the Fermi–Hubbard model and for arithmetic (Adder)"
  • grid-coupling topology: A hardware connectivity layout where qubits are arranged on a grid with local couplings. "using an experimentally demonstrated grid-coupling topology"
  • grid topology: A regular two-dimensional lattice connectivity constraint used to model realistic nearest-neighbor device couplings. "Further limiting physical connectivity to grid topology accounts for otherwise challenging growth in coupler counts, frequency collisions, and calibration burden"
  • heterogeneous quantum computing architecture: A system design that combines different types of hardware modules (and possibly qubit modalities/codes) specialized for distinct roles. "presenting a complete heterogeneous quantum computing architecture incorporating task-specific hardware selection and QEC encoding"
  • logical clock cycles: Discrete time steps at the logical layer during which encoded operations or error-correction rounds are scheduled. "on average each qubit is inactive for 9697%\sim96-97\% of logical clock cycles."
  • logical patches: Encoded logical-qubit regions in a lattice-based code representation that may need routing for operations or transfers. "logical patches must be routed to the nearest transfer patch"
  • logical SWAP: A logical operation that exchanges the positions (or identities) of two encoded qubits, often used for routing on constrained topologies. "LoNG-range routing of quantum data should be handled by the QB when it reduces logical (SWAP) operations."
  • magic state: A specially prepared non-stabilizer state used to enable non-Clifford gates in fault-tolerant schemes. "Magic states for non-Clifford operations on quantum data shall be generated by a specialized quantum state factory (QSF)."
  • magic state distillation: A process that converts many noisy magic states into fewer, higher-fidelity ones necessary for reliable non-Clifford computation. "Magic state distillation is a dominant overhead in fault-tolerant resource estimates"
  • micro-architecture: The detailed, low-level organization and protocol design that governs how components implement instructions and transfers. "with a MiCRo-architecture that includes fault-tolerant transfer protocols and resource generation."
  • non-Clifford operations: Quantum gates outside the Clifford group (e.g., T, CCZ) required for universal quantum computation and typically more costly fault-tolerantly. "Magic states for non-Clifford operations on quantum data shall be generated by a specialized quantum state factory (QSF)."
  • nonlocal physical connectivity: Hardware capability allowing interactions between qubits that are not adjacent in physical layout. "the nonlocal physical connectivity, required in \cite{Yoder2025} using qLDPC codes, is achievable in memory"
  • photonic interconnects: Optical channels and components used to connect quantum modules over distance, enabling entanglement distribution and state transfer. "photonic \cite{Monroe2014} or transduction-based \cite{Heya2025} interconnects applied to enable connectivity between modules."
  • qLDPC codes: Quantum low-density parity-check codes that offer high rate and favorable scaling for storage, sometimes at the expense of gate simplicity. "implementing quantum memory using qLDPC codes reduces the resources required for factoring"
  • quantum bus (QB): A communication fabric that moves quantum states between modules, often via teleportation over entangled links. "The architecture shall include a quantum bus (QB) for communication of quantum information between modules"
  • quantum compiler (Q-CHESS): A micro-architecture-aware toolchain that schedules, routes, and synthesizes machine-level instructions for heterogeneous systems. "Control of quantum data shall be performed by Q-CHESS: a Quantum Compiler for Heterogeneous Execution Scheduling and Synthesis, which is micro-architecture aware and outputs Machine-level instructions."
  • quantum error correction (QEC): Techniques using redundancy and measurement to protect quantum information from noise and enable scalable computation. "quantum error correction (QEC)"
  • quantum Fourier Transform (QFT): A key subroutine that performs the discrete Fourier transform on quantum amplitudes, central to algorithms like Shor’s. "For the Quantum Fourier Transform (QFT) --- a subroutine critical to many implementations of Shor’s algorithm \cite{Kutin2006} --- our heterogeneous framework achieves a 4259×42-59\times reduction"
  • quantum memory (QM): A storage module designed for long-lived preservation of quantum states, decoupled from compute-intensive hardware. "The architecture shall include a dedicated quantum memory (QM) tier for storing IDLe quantum data."
  • quantum processing unit (QPU): A fixed-size compute module that executes universal fault-tolerant logic on a bounded number of logical qubits. "Computation of universal fault-tolerant quantum logic on quantum data shall be performed within fixed-size quantum processing unit(s) (QPU)."
  • quantum state factory (QSF): A dedicated module for producing and distilling resource states (e.g., magic states) needed for non-Clifford gates. "Magic states for non-Clifford operations on quantum data shall be generated by a specialized quantum state factory (QSF)."
  • random-access quantum memory (RAQM): A memory tier with approximately uniform access latency regardless of where a state is stored, supporting active QEC during storage. "The architecture should include a random-access quantum memory (RAQM) tier capable of storing quantum data with uniform access latency."
  • routing: The process of moving logical qubits within or across modules to satisfy gate connectivity constraints, typically incurring SWAP overheads. "Routing is a dominant contributor to fault-tolerant overhead"
  • Shor’s algorithm: A quantum algorithm for integer factorization that leverages period finding, often used to assess resource requirements. "For the Quantum Fourier Transform (QFT) --- a subroutine critical to many implementations of Shor’s algorithm \cite{Kutin2006} ---"
  • static transversal quantum memory (STQM): A short-term storage tier relying on ultra-long coherence without active error correction, deferring QEC to the QPU. "a distinct ``Static'' transversal quantum memory (STQM) tier that defers active QEC to the QPU"
  • stored-program architecture: A computing paradigm where instructions and data are stored in memory and fetched for execution, inspiring the quantum design analogy. "establishing the stored-program architecture."
  • syndrome extraction: The process of measuring stabilizers to detect errors in an error-correcting code without collapsing logical information. "By eliminating the need for continuous syndrome extraction and feedback"
  • teleportation-based state transfer: Moving quantum states by consuming entanglement and classical communication, avoiding direct qubit movement. "Bell-pair generation enables teleportation-based state transfer."
  • T-cultivation: The generation and refinement of T-type magic states to supply non-Clifford gates in fault-tolerant circuits. "fully account for the physical qubits required for fault tolerant state-transfer and high quality T-cultivation,"
  • transduction-based interconnects: Interfaces that convert quantum information between different carrier types (e.g., microwave-to-optical) for long-range links. "photonic \cite{Monroe2014} or transduction-based \cite{Heya2025} interconnects applied to enable connectivity between modules."
  • transversal teleportation: A code-level operation where teleportation is applied transversally across encoded blocks, preserving fault tolerance. "enabling transversal teleportation between the dedicated transfer patches"
  • transversality: A fault-tolerance property allowing certain logical gates to be implemented by applying gate operations independently across code blocks. "transversality is not relevant in memory where logical operations between encoded qubits are not anticipated"
  • transfer patch: A designated region or patch used to interface memory code patches with the interconnect for state transfers. "logical patches must be routed to the nearest transfer patch"
  • toroidal manifolds: Donut-shaped topological structures used to illustrate or implement certain code layouts (e.g., bicycle codes). "Toroidal manifolds (donuts) illustrate a module of 12 qubits in the ``Gross\" bivariate bicycle code."
  • tyranny of numbers: A scaling challenge where system complexity (e.g., wiring/interconnects) grows rapidly with component count, hindering straightforward scaling. "the quantum industry is now confronting its own ``tyranny of numbers''"
  • ultra-long coherence (ULC): Exceptionally long qubit coherence times that allow storage without frequent error correction. "ultra-long coherence times (ULC)"
  • von Neumann: Refers to the foundational stored-program computing architecture and its design principles, used here as an analogy for quantum systems. "and traces back to the original proposal by von Neumann et al."

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 28 tweets with 373 likes about this paper.