GlobalFoundries 22FDX LDPC Decoder ASIC
- GlobalFoundries 22FDX LDPC Decoder ASIC is a fully parallel, multi-rate binary LDPC decoder implemented in 22nm FD-SOI for ultra-reliable low latency communications.
- It employs edge-adaptive min-sum message passing, pipeline interleaving, and early termination logic to achieve a record 14 ns decoding latency and throughput up to 9 Gb/s.
- The design balances competitive area efficiency, energy performance, and robust error correction, making it ideal for 5G URLLC and short-packet wireless applications.
The GlobalFoundries 22FDX LDPC Decoder ASIC is a fully parallel, short-blocklength, multi-rate binary LDPC decoder implemented in 22 nm FD-SOI technology. Developed for ultra-reliable low latency communication (URLLC) applications such as 5G, the design features a custom co-optimized QC-LDPC code and ASIC architecture that achieves record-low decoding latency of 14 ns, information throughput of 9 Gb/s, and an active area of 0.44 mm at 62 pJ/b energy for a 128-bit, rate-1/2 codeword. The design incorporates pipeline interleaving, edge-adaptive min-sum message passing, and early termination logic to enable efficient, high-throughput operation with minimal energy overhead (Nonaca et al., 19 Dec 2025).
1. Algorithmic Engine and Datapath Architecture
The ASIC implements a fully parallel message-passing (MP) flooding-schedule decoder for binary LDPC codes. The top-level datapath comprises 288 variable-node (VN) processing blocks—supporting the largest blocklength—and 96 check-node (CN) blocks, with each corresponding to individual coded bits or parity checks, respectively. Communication proceeds in iterations, each divided into two phases: all CNs update in parallel, utilizing an edge-adaptive normalized min-sum algorithm, followed by simultaneous VN updates.
The CN update for message follows:
with as normalization constants. The VN update aggregates the sum of all incoming messages with the intrinsic LLR, . Each VN→CN processing unit (PU) employs two pipeline registers (R1, R2), effecting signe-magnitude conversions, accumulating minima, and supporting extrinsic message computation and propagation across iterations. Hard-decision outputs and early-termination (ET) logic enable rapid halting of the decode, reducing unnecessary iterations.
Pipeline interleaving is realized by overlapping two independent codewords through the same pipeline registers, effectively doubling throughput without increasing decode latency or critical path.
2. Code Construction and Parameters
The code is based on a rate-compatible AR4A protograph (3 × 9 matrix), subsequently expanded by protograph expansion (PEG, ) to increase girth, followed by quasi-cyclic (QC) lifting (ACE, ) for suitable cycle connectivity. Each “1” in the base graph is mapped to an cyclically shifted identity matrix.
The design supports three rates via column removal and bit puncturing:
| Code Rate | Blocklength | Information Bits | Punctured |
|---|---|---|---|
| 3/4 | 288 | 192 | 256 |
| 2/3 | 224 | 128 | 192 |
| 1/2 | 160 | 64 | 128 |
Blocklength and dimensionality reduction are realized through column truncation and uniform bit puncturing (32 bits for each mode). Each VN has degree 2 or 3, while every CN reaches degree 9. Decoding employs up to iterations; ET typically reduces the average.
3. Performance and Efficiency Metrics
The decoder achieves a clock rate of GHz (for ), with a pipeline depth of cycles per iteration. Performance formulas are as follows:
- Latency per codeword:
- Throughput ():
yielding 9.29 Gb/s for (), and up to 21.92 Gb/s for ().
- Energy per bit ():
At and mW, pJ/b. ET reduces average power and energy by approximately 40–60 %.
Area is partitioned as follows:
| Block | Area (% of total) | Area [mm] |
|---|---|---|
| VN/CN logic + PUs | 65% | 0.29 |
| I/O LLR SRAMs | 25% | 0.11 |
| ET / I/O | 10% | 0.04 |
4. Physical Implementation in 22FDX
Manufactured on GlobalFoundries 22 FDX (22 nm FD-SOI FinFET), the chip leverages body-bias tuning for leakage and performance optimization. The physical floorplan centralizes the VN/CN array, placing memories and ET logic peripherally. Clocking is globally synchronous, featuring fine-grained gating in ET logic to freeze idle pipeline registers, thus enhancing dynamic power efficiency.
Power is delivered via a custom mesh; the compute core operates at V, with I/O at 1.2 V.
5. Comparative Analysis with Related Decoder ASICs
Table: Representative Decoders—Performance Comparison (all latencies for unfurled iterations)
| Design | Rate | Latency [ns] | Thruput [Gb/s] | Area [mm] | [pJ/b] | |
|---|---|---|---|---|---|---|
| 22FDX LDPC ASIC (22 nm) | 1/2 | 64,128 | 13.8 | 9.29 | 0.44 | 61.9 |
| RG-Mahmood ’18 (28 nm) | 0.84 | 1723,2048 | 69.6 | 494.7 | 16.2 | 27.0 |
| ZZ ’10 (65 nm) | 0.84 | 1723,2048 | 137 | 40.1 | 5.05 | 69.8 |
| MM ’18 (28 nm) | 1/2 | 336,672 | 793 | 3.39 | 1.99 | 120 |
| AV ’24 (110 nm) | 2/3 | 352,528 | 120 | 1.11 | 1.96 | 135 |
| CT ’21 Polar (40 nm) | — | 128,256 | 310 | 0.41 | 0.18 | 31.1 |
| PG ’17 Polar (28 nm) | — | 512,1024 | 7820 | 0.06 | 0.44 | 356 |
| DK ’24 BOSS (28 nm) | 0.12 | 15,128 | 21.9 | 0.68 | 0.37 | 48.7 |
Key outcomes are lowest-in-class latency (13.8 ns for short blocklengths), competitive throughput and area efficiency, and moderate energy consumption (with ET reducing below 40 pJ/b on average). Block error rate (BLER) performance is within 0.5 dB of 5G polar SCL (list 8), approximately 1.5 dB from the normal-approximation bound at 128 bits.
6. Context and Significance
The integration of short-blocklength QC-LDPC code construction, edge-adaptive normalized min-sum MP decoding, and a fully parallel 22FDX ASIC datapath directly addresses URLLC requirements for minimal latency and robust throughput. The record-low 14 ns latency is attributable to architectural co-design, including pipeline interleaving and fast ET logic. The design demonstrates a trade-off: while energy efficiency trails that of very large-scale long-block decoders, the latency and area characteristics represent a favorable compromise for short-packet wireless applications. This approach establishes a distinct solution space between high-latency SCL/polar decoders and high-throughput, high-energy, large-area long-block LDPC ASICs (Nonaca et al., 19 Dec 2025).