Finite-Precision AC Coding: FPA-CCDM

Updated 6 March 2026

FPA-CCDM is a framework that maps binary data into fixed-composition sequences using arithmetic coding under finite-precision constraints.
It employs quantized interval arithmetic with model rounding and renormalization to ensure rate-optimality and invertibility despite limited precision.
The technique balances precision, computational complexity, and resource usage, making it suitable for high-throughput systems like 5G and optical communications.

Finite-Precision Arithmetic Coding-based Constant-Composition Distribution Matching (FPA-CCDM) is a framework for lossless distribution matching employing arithmetic coding under finite-precision constraints. It targets the mapping of binary data into sequences with exactly prescribed empirical symbol distributions. FPA-CCDM forms the core of signal shaping architectures for contemporary communication systems, such as probabilistically shaped modulation in optical fiber and 5G. The primary technical focus is on achieving rate-optimality and invertibility despite practical limitations on arithmetic precision, integer word-length, and circuit complexity.

1. Fundamental Concepts and the CCDM Framework

CCDM operates by transforming input sequences of k bits (typically uniform, i.i.d. Bernoulli(½)) into n-symbol output sequences of fixed empirical composition (“type”), chosen to approximate a target distribution $P_A$ over an alphabet $\mathcal{A}$ of size $m$ . Formally, for a composition vector $\vec{n} = (n_1, ..., n_m)$ with $\sum_i n_i = n$ , the constant-composition set is

$\mathcal{T}_{\vec{n}} = \left\{ a^n \in \mathcal{A}^n : \,\#\{j : a_j = a_i\} = n_i \text{ for all } i = 1..m \right\}$

and the mapping encoder is constructed such that all output blocks have this exact composition. The maximal number of input bits is $m = \lfloor \log_2 |\mathcal{T}_{\vec{n}}| \rfloor$ , and the corresponding mapping is invertible and fixed-to-fixed length (Schulte et al., 2015).

To minimize informational divergence, $\vec{n}$ is selected (subject to integer constraints) to minimize $D(\vec{n}/n \| P_A)$ , that is, to best approximate $P_A$ (Schulte et al., 2015).

2. Arithmetic Coding with Finite Precision

In practical settings, the interval arithmetic operations central to CCDM are performed on integers of bounded word-length rather than real numbers. The FPA-CCDM algorithm extends Ramabadran’s binary finite-precision AC scheme to the general m-ary case (Pikus et al., 2019). Key features include:

Model rounding: At each symbol emission step, cumulative and branching probabilities are quantized to integer counts using a scale parameter $\Theta = n - i$ at the $i$ -th symbol:

$\hat{F}_{i+1|i}(a|s) = \left\lfloor \Theta \cdot F_{i+1|i}(a|s) + \frac{1}{2} \right\rfloor$

$\hat{P}_{i+1|i}(a|s) = \hat{F}_{i+1|i}(a|s) - \hat{F}_{i+1|i}(a^{-}|s)$

where $F_{i+1|i}$ and $P_{i+1|i}$ denote cumulative and branching probabilities, respectively; $a^{-}$ is the symbol prior to $a$ in lex order.

Interval representation: Each subinterval $I(s)$ is stored as three integers $\left(\hat{x}(s), \hat{y}(s), L(s)\right)$ with

$x(s) = \frac{\hat{x}(s)}{2^{L(s)+w}}, \quad y(s) = \frac{\hat{y}(s)}{2^{L(s)+w}}$

Here $w$ is the number of precision ("mantissa") bits.

Renormalization and output: After processing all symbols, the output codeword is the binary integer $\left\lfloor \hat{x}/2^{L} \right\rfloor$ .

Decoding is the exact reverse process, deterministically extracting the input bit sequence from the interval evolution, ensuring invertibility as long as interval partitioning and renormalization invariants are satisfied (Pikus et al., 2019, Schulte et al., 2015).

3. Rate-Loss Analysis Under Finite Precision

A key analytical result for FPA-CCDM is that finite-precision effects—i.e., the rounding of model statistics and interval endpoints—produce a provable rate loss, but one that diminishes exponentially in the number $m$ of precision bits:

$\Delta R = R_{\rm{IPA}} - R_{\rm{FPA}}(m) \leq C \cdot 2^{-m}$

where $C$ depends on the composition and alphabet (Pikus et al., 2019). This is established via a "peeling-off" argument (Eq. (11) of (Pikus et al., 2019)) showing that every step’s rounding dilates the ideal interval by at most $(1 + \delta)$ with $\delta \approx 2^{-m}$ (for CCDM). The total effect is a worst-case rate loss $\Delta k$ that is summable over all $n$ symbols and expressed exactly for the worst-case codeword as

$\Delta k = \sum_{i=0}^{n-1} \log_2 \left( 1 + 2^{-w} P_{C_{i+1}|C_1^i}(z_{i+1}|z_1^i) \right)$

with $z$ being the codeword with all symbols grouped, and $P_C$ the type class probability (Pikus et al., 2019).

Numerical results indicate that practical choices such as $m=12\ldots 18$ suffice for $n \leq 10^4$ and $|\mathcal{A}| \leq 16$ , yielding negligible rate loss (e.g., $< 0.01$ bits/symbol) (Pikus et al., 2019).

4. Implementation Complexity and Precision vs. Resource Trade-Offs

FPA-CCDM achieves a trade-off between arithmetic word-length, implementation complexity, and achievable rate. Per-symbol costs include a small and bounded number of integer multiplications, shifts, and a division by $\Theta = n-i$ (requiring $\sim \log n$ bits). Interval updating and model statistics are maintained with $O(m \log n)$ bits (Pikus et al., 2019, Schulte et al., 2015).

Table: Resource scaling for FPA-CCDM across representative block lengths and precision

Block length $n$	$m$ required for $\Delta k<0.1$	Typical hardware word size
64–256 (5G)	8–12	16 bits
1,000–5,000 (Optical)	12–16	24–32 bits
10,000–1,000,000	$\geq 18$	32–64 bits

For short blocks (e.g., $n \leq 256$ ), even $m \approx 8$ is sufficient for negligible rate loss, with low hardware overhead. For longer blocks, increasing $m$ as $\log n$ ensures a target rate gap is maintained (Pikus et al., 2019, Schulte et al., 2015).

5. Extensions: Log-CCDM and Multiplication-Free Approaches

Recent work has addressed the computational overhead from high-precision multiplications and divisions inherent in FPA-CCDM. The "Log-CCDM" construction implements distribution matching based on lookup tables and purely additive log-domain arithmetic, replacing every multiplication/division by LUT indexing and addition/subtraction (Gültekin et al., 2022).

Log-CCDM employs three LUTs: the first stores exponentially spaced intervals, while the others realize approximate log-times and log-divide operations. The required arithmetic precision grows only logarithmically with $n$ , not linearly, and storage requirements are reduced to a few kilobytes (e.g., $<$ 4 kB for $n=1024$ ), while achieving sub-$0.01$ bit/symbol rate loss (Gültekin et al., 2022).

6. Numerical Performance, Rate Recovery, and Invertibility

Empirical results across both standard and log-domain FPA-CCDM show that, for moderate to large $m$ , rate loss and normalized divergence both decay rapidly as $n$ or $m$ increase. For $|\mathcal{A}|=16$ , $w=12,18$ bits, and $n \lesssim 10^4$ , the observed rate tracks the full-precision limit (Shannon entropy of $P_A$ ) within $\sim 0.01$ bits/symbol (Pikus et al., 2019, Gültekin et al., 2022).

Invertibility is guaranteed due to the precise interval rounding (always on endpoints, not widths), and because the partitioning property of intervals is preserved at every step in both encoding and decoding. This condition is satisfied provided the mantissa $\hat{y}(s)$ remains above $2^{w}$ and interval underflow conditions do not occur (Pikus et al., 2019, Schulte et al., 2015).

7. Practical Deployment and System Considerations

FPA-CCDM’s modularity and rigorously bounded rate loss have made it foundational for modern communication system components requiring precise distribution control, including high-throughput probabilistic shaping engines for both block and streaming applications. FPGA and ASIC implementations leverage moderate arithmetic precision and exploit the $O(m \log n)$ state storage to enable scalable, hardware-efficient deployment (Pikus et al., 2019, Gültekin et al., 2022).

In summary, FPA-CCDM enables fully invertible, near-optimal fixed-to-fixed length shaping via arithmetic coding under finite-precision constraints, with mathematically bounded rate loss and manageable computational and memory requirements over a broad regime of signal shaping scenarios.