Efficient Parallel List Decoding

Updated 9 February 2026

The paper introduces a recursive algorithm that exploits automorphism-based splitting to achieve efficient parallel list decoding with bounded candidate growth.
The methodology decomposes the problem into parallel subproblems and employs a divide-and-conquer strategy to maintain poly-logarithmic runtime.
The approach generalizes unique decoding techniques for Barnes–Wall lattices and extends to polar and Reed–Solomon codes, enhancing high-speed communication systems.

Efficient parallel list decoding algorithms are a critical class of techniques in coding theory and lattice decoding, enabling the recovery of all codewords or lattice points within a certain distance from a received vector, with high throughput and low latency. Such algorithms are indispensable in communication systems aiming to correct bursts of errors or operate at error rates close to fundamental coding limits. The design challenge lies in containing the exponential growth of candidate solutions (the "list") and exploiting parallel architectures without deteriorating error-correcting performance. In the context of lattice codes, polar and algebraic codes, substantial progress has been made in algorithmic parallelization and complexity control.

1. Formal Statement of the Parallel List Decoding Problem

Consider a code $\mathcal{C}$ (over a finite field, Euclidean space, or on a lattice) with specified minimum distance, and a received word or vector $r$ . The list decoding task is, for a given radius parameter $\eta$ , to output the set: $L(r, \eta) = \{ x \in \mathcal{C} : d(r, x) \leq \eta \}$ where $d(\cdot, \cdot)$ is a code- or lattice-appropriate distance metric (e.g., Hamming, Euclidean). The central quantities of interest are the worst-case list size $L_{\max} = \max_{r} |L(r, \eta)|$ , and the parallel runtime—ideally polylogarithmic in blocklength or dimension—when using a polynomial number of processors. For lattices, this instantiates as, given a lattice $L \subseteq \mathbb{R}^n$ (or $\mathbb{C}^n$ ), finding all $w \in L$ with $(1/n)\|w - r\|_2^2 \leq \eta$ (Grigorescu et al., 2011).

2. Combinatorial List Size Bounds and Johnson-Type Barriers

Classical bounds relate attainable error radius to worst-case list size. The Johnson bound, for codes with minimum relative distance $\delta$ , guarantees a polynomial list size only up to radius $\eta < \delta/2$ . For lattices (notably Barnes–Wall, with $\delta=1$ ), this means $|L(r, 1/2)| \leq 4N$ , and $|L(r, 1/2-\epsilon)| \leq \tfrac{1}{2\epsilon}$ (Grigorescu et al., 2011). However, through combinatorial analysis specific to lattice structure, one can prove much tighter polynomial list-size bounds up to nearly the minimum distance: $\ell(1-\epsilon, n) \leq 4\cdot (1/\epsilon)^{16n} = N^{O(\log(1/\epsilon))}$ for Barnes–Wall lattices with $N=2^n$ . This demonstrates that list decoding to radii approaching 1 remains tractable, polynomially bounding the explosion of candidates—a crucial property enabling efficient parallel decoding even beyond the classical Johnson radius.

3. Divide-and-Conquer Parallel List-Decoding Algorithms

Central to efficient parallelization is a recursive, automorphism-exploiting decomposition of the code or lattice. For Barnes–Wall lattices, the point set is recursively decomposed as $[u, u+\phi v]$ (with $\phi=1+i$ ), facilitating a recursive algorithm:

Splitting: Decompose the $N$ -dimensional input into halves, along with automorphically transformed subspaces.
Recursive Decoding: Independently list-decode four subproblems in $N/2$ dimensions, corresponding to natural and transformed halves.
Combination: For each pair of candidate points from appropriate sublists, solve a small linear system to reconstruct a parent candidate, retaining it if it remains within the decoding radius (Grigorescu et al., 2011).

Pseudocode Structure

def ListDecodeBW(r, η):
    if N == 1:
        return {x ∈ ℤ[i] : |r - x|^2 ≤ η}
    r0, r1 = partition(r)
    r_plus = (φ/2) * (r0 + r1)
    r_minus = (φ/2) * (r0 - r1)
    L0 = ListDecodeBW(r0, η)
    L1 = ListDecodeBW(r1, η)
    L_plus = ListDecodeBW(r_plus, η)
    L_minus = ListDecodeBW(r_minus, η)
    Combine lists as per automorphism constraints
    Filter for δ(r, w) ≤ η
    return L

This recursion naturally maps to parallel computation, as each subproblem and sublist can be assigned disjoint processor sets.

4. Parallel Complexity and Processor Requirements

In the CREW PRAM model (unit-cost complex arithmetic), at recursion depth $n$ , all four subcalls are fully parallelizable. The combining step (list size $\ell$ ) involves up to $O(\ell^2)$ candidate checks with independent computation of Euclidean distances—each of $O(N)$ arithmetic cost, reducible to $O(\log N)$ parallel time via prefix sums or tree reductions.

The recurrences are:

Sequential: $T(n) = 4T(n-1) + O(N \cdot \ell^2) \implies O(N^2 \cdot \ell^2)$ .
Parallel: With $p \geq N^2 \cdot \ell^2$ , total depth $O(\log^2 N)$ . Thus, the algorithm achieves poly-logarithmic parallel runtime (depth) with polynomially many processors, provided the list size is bounded by a polynomial (Grigorescu et al., 2011).

Metric	Asymptotic Bound	Applies To
List size	$N^{O(\log(1/\epsilon))}$	BW, up to $\eta = 1-\epsilon$
Parallel time (depth)	$O(\log^2 N)$ (with sufficient $p$ )	Barnes–Wall lattice
Processors required	$O(N^2 \cdot \ell^2)$	Barnes–Wall lattice

The tight bound on $\ell$ is what allows depth-efficient parallel list decoding, surpassing the Johnson-bound limit achievable generically for codes and lattices.

5. Algorithmic Innovations and Generalizations

The algorithm generalizes unique decoding algorithms (e.g., Micciancio–Nicolosi for $\eta < 1/4$ ) to full list decoding at arbitrary radii, while maintaining polynomial complexity in both the code/lattice dimension $N$ and the worst-case list size. The approach blends several key innovations:

Recursive Automorphism Exploitation: Automorphism-based splitting not only preserves metric structure (distance-preserving transformations) but aligns with the lattice's recursive construction.
Efficient List Combination: The combination step leverages the structure so that candidate filtering does not become a bottleneck, and each candidate can be verified independently and in parallel.
Processor-Time Tradeoff: The analysis is explicit about the tradeoff between available parallelism (processor count) and achievable decoding depth, making the method amenable to practical high-throughput hardware or large-scale parallel computation.

While the described algorithm is specific to the Barnes–Wall lattice, the same divide-and-conquer ideas appear in parallel list decoding for codes with recursive or algebraic structure, such as polar and Reed–Solomon codes (Lu et al., 2023, Cohn et al., 2010).

6. Comparison to Broader List Decoding and Practical Significance

Generic Johnson bounds severely limit the radius for which polynomial list-size (and hence efficient decoding) is achievable. In contrast, the Barnes–Wall lattice algorithm achieves polynomial list decoding up to error radii approaching the minimum distance—a fundamental improvement. Not only does the method yield the first efficient parallel list decoder for a natural infinite family of lattices beyond Johnson-type bounds, but it underpins the design of decoders with both strong error-correction and high throughput in critical applications, e.g., high-speed wireless systems, storage, and cryptography.

The general scheme of:

Recursion via code/lattice automorphisms,
Independent parallel subproblem processing,
Bounded-size candidate merging and filtering, is now a paradigm for achieving latency-optimal decoders in both classical and modern coding architectures.

7. References and Broader Impact

The foundational algorithm and analysis for efficient parallel list decoding of Barnes–Wall lattices were introduced by Grigorescu and Peikert (Grigorescu et al., 2011). Their approach synthesizes combinatorial list-size bounds, algorithmic recursion, and parallel complexity analysis to achieve practical and theoretical decoding advances. These results motivate further exploration of structure-driven parallelization in coding and lattice problems, as well as provoke new questions on the ultimate possibilities and limits of parallel error-correcting decoding.

A plausible implication is that similar techniques can be adapted to other highly structured code families, provided one can prove polynomial bounds on combinatorial list size and devise efficient recursion/algebraic decomposition strategies.

Markdown Report Issue Upgrade to Chat

References (3)

List Decoding Barnes-Wall Lattices (2011)

Fast List Decoding of High-Rate Polar Codes (2023)

Ideal forms of Coppersmith's theorem and Guruswami-Sudan list decoding (2010)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Efficient Parallel List Decoding Algorithm.