GlobalFoundries 22FDX LDPC Decoder ASIC

Updated 26 December 2025

GlobalFoundries 22FDX LDPC Decoder ASIC is a fully parallel, multi-rate binary LDPC decoder implemented in 22nm FD-SOI for ultra-reliable low latency communications.
It employs edge-adaptive min-sum message passing, pipeline interleaving, and early termination logic to achieve a record 14 ns decoding latency and throughput up to 9 Gb/s.
The design balances competitive area efficiency, energy performance, and robust error correction, making it ideal for 5G URLLC and short-packet wireless applications.

The GlobalFoundries 22FDX LDPC Decoder ASIC is a fully parallel, short-blocklength, multi-rate binary LDPC decoder implemented in 22 nm FD-SOI technology. Developed for ultra-reliable low latency communication (URLLC) applications such as 5G, the design features a custom co-optimized QC-LDPC code and ASIC architecture that achieves record-low decoding latency of 14 ns, information throughput of 9 Gb/s, and an active area of 0.44 mm $^2$ at 62 pJ/b energy for a 128-bit, rate-1/2 codeword. The design incorporates pipeline interleaving, edge-adaptive min-sum message passing, and early termination logic to enable efficient, high-throughput operation with minimal energy overhead (Nonaca et al., 19 Dec 2025).

1. Algorithmic Engine and Datapath Architecture

The ASIC implements a fully parallel message-passing (MP) flooding-schedule decoder for binary LDPC codes. The top-level datapath comprises 288 variable-node (VN) processing blocks—supporting the largest blocklength—and 96 check-node (CN) blocks, with each corresponding to individual coded bits or parity checks, respectively. Communication proceeds in iterations, each divided into two phases: all CNs update in parallel, utilizing an edge-adaptive normalized min-sum algorithm, followed by simultaneous VN updates.

The CN update for message $\ell_{i\rightarrow j}$ follows:

$\ell_{i \rightarrow j} \approx \alpha_{ij} \left(\prod_{k\in N(i)\setminus j} \operatorname{sign}(\ell_{k\rightarrow i})\right)\min_{k\in N(i)\setminus j}|\ell_{k\rightarrow i}|$

with $\alpha_{ij}$ as normalization constants. The VN update aggregates the sum of all incoming messages with the intrinsic LLR, $\ell_i$ . Each VN→CN processing unit (PU) employs two pipeline registers (R1, R2), effecting signe-magnitude conversions, accumulating minima, and supporting extrinsic message computation and propagation across iterations. Hard-decision outputs and early-termination (ET) logic enable rapid halting of the decode, reducing unnecessary iterations.

Pipeline interleaving is realized by overlapping two independent codewords through the same pipeline registers, effectively doubling throughput without increasing decode latency or critical path.

2. Code Construction and Parameters

The code is based on a rate-compatible AR4A protograph (3 × 9 matrix), subsequently expanded by protograph expansion (PEG, $Z=4$ ) to increase girth, followed by quasi-cyclic (QC) lifting (ACE, $Z=8$ ) for suitable cycle connectivity. Each “1” in the base graph is mapped to an $8\times 8$ cyclically shifted identity matrix.

The design supports three rates via column removal and bit puncturing:

Code Rate	Blocklength $n$	Information Bits $k$	Punctured $n'$
3/4	288	192	256
2/3	224	128	192
1/2	160	64	128

Blocklength and dimensionality reduction are realized through column truncation and uniform bit puncturing (32 bits for each mode). Each VN has degree 2 or 3, while every CN reaches degree 9. Decoding employs up to $I_{\max}=10$ iterations; ET typically reduces the average.

3. Performance and Efficiency Metrics

The decoder achieves a clock rate of $f_\mathrm{clk}=1.452$ GHz (for $R=1/2$ ), with a pipeline depth of $N_\mathrm{pipe}=2$ cycles per iteration. Performance formulas are as follows:

Latency per codeword:

$\mathrm{Latency}\approx I_{\max}\cdot N_\mathrm{pipe}/f_\mathrm{clk} = 10\,\mathrm{iterations}\times 2\,\mathrm{cycles}/1.452\,\mathrm{GHz} \approx 13.78\,\mathrm{ns}$

Throughput ( $\theta$ ):

$\theta = k\cdot f_\mathrm{clk}/I_\mathrm{max}$

yielding 9.29 Gb/s for $R=1/2$ ( $k=64$ ), and up to 21.92 Gb/s for $R=3/4$ ( $k=192$ ).

Energy per bit ( $E_b$ ):

$E_b = P_\mathrm{active}/\theta$

At $R=1/2$ and $P_\mathrm{active}=575$ mW, $E_b\approx 61.9$ pJ/b. ET reduces average power and energy by approximately 40–60 %.

Area is partitioned as follows:

Block	Area (% of total)	Area [mm $^2$ ]
VN/CN logic + PUs	65%	0.29
I/O LLR SRAMs	25%	0.11
ET / I/O	10%	0.04

4. Physical Implementation in 22FDX

Manufactured on GlobalFoundries 22 FDX (22 nm FD-SOI FinFET), the chip leverages body-bias tuning for leakage and performance optimization. The physical floorplan centralizes the VN/CN array, placing memories and ET logic peripherally. Clocking is globally synchronous, featuring fine-grained gating in ET logic to freeze idle pipeline registers, thus enhancing dynamic power efficiency.

Power is delivered via a custom mesh; the compute core operates at $V_{dd}=0.8$ V, with I/O at 1.2 V.

Table: Representative Decoders—Performance Comparison (all latencies for unfurled iterations)

Design	Rate	$(k, n')$	Latency [ns]	Thruput [Gb/s]	Area [mm $^2$ ]	$E_b$ [pJ/b]
22FDX LDPC ASIC (22 nm)	1/2	64,128	13.8	9.29	0.44	61.9
RG-Mahmood ’18 (28 nm)	0.84	1723,2048	69.6	494.7	16.2	27.0
ZZ ’10 (65 nm)	0.84	1723,2048	137	40.1	5.05	69.8
MM ’18 (28 nm)	1/2	336,672	793	3.39	1.99	120
AV ’24 (110 nm)	2/3	352,528	120	1.11	1.96	135
CT ’21 Polar (40 nm)	—	128,256	310	0.41	0.18	31.1
PG ’17 Polar (28 nm)	—	512,1024	7820	0.06	0.44	356
DK ’24 BOSS (28 nm)	0.12	15,128	21.9	0.68	0.37	48.7

Key outcomes are lowest-in-class latency (13.8 ns for short blocklengths), competitive throughput and area efficiency, and moderate energy consumption (with ET reducing $E_b$ below 40 pJ/b on average). Block error rate (BLER) performance is within 0.5 dB of 5G polar SCL (list 8), approximately 1.5 dB from the normal-approximation bound at 128 bits.

6. Context and Significance

The integration of short-blocklength QC-LDPC code construction, edge-adaptive normalized min-sum MP decoding, and a fully parallel 22FDX ASIC datapath directly addresses URLLC requirements for minimal latency and robust throughput. The record-low 14 ns latency is attributable to architectural co-design, including pipeline interleaving and fast ET logic. The design demonstrates a trade-off: while energy efficiency trails that of very large-scale long-block decoders, the latency and area characteristics represent a favorable compromise for short-packet wireless applications. This approach establishes a distinct solution space between high-latency SCL/polar decoders and high-throughput, high-energy, large-area long-block LDPC ASICs (Nonaca et al., 19 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

A 14ns-Latency 9Gb/s 0.44mm$^2$ 62pJ/b Short-Blocklength LDPC Decoder ASIC in 22FDX (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GlobalFoundries 22FDX LDPC Decoder ASIC.

GlobalFoundries 22FDX LDPC Decoder ASIC

1. Algorithmic Engine and Datapath Architecture

2. Code Construction and Parameters

3. Performance and Efficiency Metrics

4. Physical Implementation in 22FDX

6. Context and Significance

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

GlobalFoundries 22FDX LDPC Decoder ASIC

1. Algorithmic Engine and Datapath Architecture

2. Code Construction and Parameters

3. Performance and Efficiency Metrics

4. Physical Implementation in 22FDX

5. Comparative Analysis with Related Decoder ASICs

6. Context and Significance

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research