Fully Distributed Surface Code

Updated 14 January 2026

Fully distributed surface codes are error-correcting frameworks that partition quantum operations across spatially separated nodes with local control, enhancing modular scalability.
This paradigm employs patch-based, node-per-qubit, and lattice surgery strategies to facilitate local syndrome extraction and global error correction via classical interconnects.
Distributed decoding methods, including union-find, machine-learning, and hierarchical techniques, achieve real-time error correction with modest resource overhead at high code distances.

A fully distributed surface code denotes a quantum error-correcting code architecture or algorithmic workflow where all error detection, syndrome processing, and/or physical qubit management are organized such that either all quantum (and often classical) operations are partitioned across spatially distinct nodes, modules, or processing elements, each with strictly local control and communication. In this paradigm, global error correction is achieved via local or regionally-scoped protocols, sometimes coordinated through classical interconnect or fast custom communication networks. The fully distributed approach encompasses architectural (hardware), logical (code layout), and algorithmic (decoding) advances; it is a foundational strategy for scaling fault-tolerant quantum computation both in modular hardware and in highly parallel classical control (Li et al., 2015, Bone et al., 2024, Liyanage et al., 2023, Liyanage et al., 2024, Varsamopoulos et al., 2019, Horsman et al., 2011).

1. Distributed Architectures: Patch-Based and Node-Based Implementations

The distributed surface code paradigm can be physically realized in several ways:

Patch-based modular networks: Each module contains a small planar or rotated patch of the surface code (logical or sub-logical qubits), physically isolated except for boundary ancillae ("broker" qubits) used for entangling links to adjacent modules. Surface code error correction proceeds within each patch, while logical-level operations and logical stabilizer measurement exploit inter-module entanglement and boundary interaction (Li et al., 2015).
Node-per-qubit networks: Each data qubit of the code (e.g., in a toric topology) resides in a distinct node (physical device, e.g., NV center in diamond). Nodes are connected by photonic or solid-state links. Multi-qubit stabilizer measurement is implemented via remote GHZ state generation shared between nodes, with classical or quantum communication orchestrating syndrome extraction (Bone et al., 2024).
2DNN patch-surgery architectures: Many small planar surface code patches are placed in arrays. Logical operations (CNOT, GHZ construction) are achieved via lattice surgery—merging and splitting patch boundaries via only nearest-neighbor (2DNN) quantum operations—without requiring large, monolithic code surfaces or defect braiding (Horsman et al., 2011).

This architectural separation isolates error sources (enabling inhomogeneous fault models), supports modular scaling, and permits local, tiered error correction strategies.

2. Fully Distributed Decoding: Local and Parallel Algorithms

Distributed decoders for the surface code are those that eschew global classical processing in favor of localized, parallelized, or regionally reducible computation. Three notable models are:

Distributed Union-Find (UF) Decoders: The Helios architecture (Liyanage et al., 2023, Liyanage et al., 2024) instantiates one processing element (PE) per vertex of the decoding graph (3D space-time lattice for QEC rounds), arranged in a 3D grid with a hierarchical controller tree. The distributed UF algorithm proceeds in tightly coordinated global stages: growing (cluster expansion), merging (cluster-ID minimization), parity convergecast (XOR over syndrome flags), and termination checking. Each PE communicates only to immediate neighbors and upstream aggregators. Helios achieves sublinear average time complexity in code distance $d$ , with empirical decoding time per round decreasing as $d$ increases, provided $O(d^3)$ parallel computing resources. This enables backlog-free, real-time decoding at code distances up to $d=51$ (via time-multiplexing).
Tile-based Distributed Machine-Learning Decoders: Overlapping-tile decoders partition the code lattice into local patches (e.g., distance-3 tiles), each hosting a lightweight neural net or lookup-based decoder. Local outputs are aggregated by a shallow global neural network which estimates the logical correction. The full decoding pipeline is parallelizable in O(1) wall time per QEC round under suitable hardware, and only requires local training datasets (Varsamopoulos et al., 2019).
Hierarchical RG and Two-Tier Decoding: Distributed or hierarchical surface-code decoders first perform physical error correction within each patch/module boundary, then correct logical qubit errors using a higher-level code. For inter-module links, error thresholds and resource overheads are set by both tiers and by fault rates of boundary operations (Li et al., 2015).

A plausible implication is that fully distributed decoding architectures are critical for matching or exceeding the syndrome extraction and physical measurement rates of large-scale surface codes.

3. Protocols and Operations: Lattice Surgery, GHZ Distribution, and Logic Gates

Lattice surgery provides fault-tolerant, fully local (2DNN) primitives for connecting planar surface code patches into a distributed logical fabric:

Merge and Split Operations: By performing rounds of stabilizer measurements on the joint or split boundary between two planar code patches, one can projectively measure joint logical operators (e.g., $X_L \otimes X_L$ , $Z_L \otimes Z_L$ ) and thereby create entanglement or implement CNOT gates (Horsman et al., 2011).
GHZ State Distribution: Smooth split operations recursively applied to a planar patch in $\ket{+}_L$ yield $n$ -qubit logical GHZ states across $n$ distributed patches, using only $n-1$ rounds of measurement and classical parity tracking. These primitives also appear in both modular and node-per-qubit distributed code proposals for distributed quantum networks.
Hierarchical/network logic operations: In the architectural generalizations, logical qubit operations between nodes or modules are mediated via entanglement swapping, broker-based Bell pair generation (potentially with purification), and subsequent local error correction (Li et al., 2015, Bone et al., 2024).

Resource analyses for CNOT via lattice surgery confirm that for $d=3$ codes, a logical gate can be enacted with only 53 qubits, significantly fewer than traditional defect-braiding or transversal operations.

4. Noise Models, Fault Tolerance, and Thresholds in Distributed Regimes

In fully distributed surface code realizations, physical and logical noise models acquire additional structure:

Inhomogeneous Error Rates: Local gates/measurements within patches are typically higher fidelity than inter-node/module operations, which dominate logical fault rates. Modular architectures require threshold calculations in the presence of highly concentrated link errors (Li et al., 2015).
Link, Memory, and Operation-Timing Thresholds: In node-per-qubit or photonic networked codes, the performance depends critically on GHZ-state generation time, memory qubit coherence times ( $T_1, T_2$ during entanglement), and physical link fidelity. For nitrogen-vacancy color center networks, thresholds for gate/measurement errors are reduced by at least a factor of three (to $p_g, p_m \lesssim 2 \times 10^{-3}$ ) when realistic memory decoherence is modeled; a link efficiency $\eta_\mathrm{link}^* \gtrsim 4\times10^2$ is required for nonzero threshold (Bone et al., 2024).
Fault-Tolerant Regimes: Sufficiently large intra-module patches ( $D \gg 1$ ) allow distributed codes to approach the monolithic surface code threshold ( $\sim 1\%$ – $10\%$ for various error models), while extremely fine-grained networks (small $D$ ) are susceptible to lower thresholds due to error accumulation at boundaries and on purified links (Li et al., 2015).
Decoder Backlog: Helios-class distributed decoders avoid backlog accumulation for arbitrarily large $d$ by ensuring per-round classical decoding latency strictly decreases with $d$ (Liyanage et al., 2024, Liyanage et al., 2023).

5. Scaling, Resource Overhead, and Trade-offs

Distributed surface-code scaling introduces new considerations relative to monolithic codes:

Regime	Threshold	Physical Overhead	Link Overhead
Fine-grained (D=1)	$\sim 0.9\%$	$\sim 15\times$	$N_\mathrm{raw} \approx$ 4–20 CNOT $^{-1}$
Coarse-grained ( $D\gg1$ )	$\sim 9.2\%$	$\sim 9\times$	$N_\mathrm{raw} \approx$ 1–4 CNOT $^{-1}$

Modular scaling saturates; increasing patch size from 8 to 1000 qubits only halves the total qubit cost per logical qubit. Time-multiplexing techniques in distributed decoding architectures allow a given FPGA to host $d>21$ code blocks at a moderate increase in per-round latency (e.g., $d=51$ in 543.9 ns per round versus 11.5 ns for $d=21$ ) (Liyanage et al., 2024).

A plausible implication is that hardware-efficient, vertex-level classical parallelism and patch-level quantum modularity jointly underlie the enabling factors for exascale fault-tolerant quantum computation.

6. Practical Implementations and Future Prospects

Distributed surface code concepts have seen realization in FPGA-based decoders, hardware modular proposals (trapped ions, superconducting patches), and network testbed experiments. The Helios UF architecture demonstrates the feasibility of backlog-free, high-throughput, resource-efficient decoding with empirical benchmarks for both phenomenological and circuit-level noise. Ongoing work pushes threshold estimates downward as memory decoherence and physical operation times are incorporated via numerical simulation (Bone et al., 2024), guiding design targets for future devices.

Current bottlenecks identified include classical interconnect latency, controller tree aggregation time, and FPGA routing complexity for extremely high $d$ . Proposed mitigations encompass controller tree branching, custom-silicon data-plane design, and time/context-multiplexing for large codes.

Continued optimization of distributed protocols (e.g., GHZ generation, entanglement purification), classical decoding acceleration, and hardware controller synchronization is poised to determine the trajectory for fault-tolerant, distributed quantum computation at the logical scale (Li et al., 2015, Liyanage et al., 2023, Liyanage et al., 2024, Bone et al., 2024).