Constant Composition Distribution Matcher (CCDM)

Updated 19 December 2025

Constant Composition Distribution Matcher (CCDM) is a fixed-length, invertible mapping that converts uniformly distributed bits into sequences with a fixed empirical distribution.
It uses combinatorial methods based on multinomial coefficients and efficient encoding techniques like arithmetic and enumerative coding to achieve near-optimal asymptotic rate with quantifiable finite-length losses.
CCDM is a key component in probabilistic amplitude shaping for modern communication systems, enhancing SNR performance and enabling efficient coded modulation over AWGN channels.

A constant composition distribution matcher (CCDM) is a fixed-length, invertible mapping from a uniformly distributed input bit sequence to an output sequence whose empirical distribution matches a prescribed target distribution as closely as possible. CCDM is foundational in probabilistic amplitude shaping (PAS) for modern coded modulation over AWGN channels and has been rigorously analyzed in terms of rate, divergence, implementation complexity, and performance at finite blocklengths. The defining property of CCDM is that all output sequences share the same empirical distribution—i.e., the output composition is fixed—which enables precise combinatorial analysis and asymptotic optimality up to vanishing rate and divergence loss.

1. Mathematical Formulation and Combinatorics

Given a finite alphabet $\mathcal{A} = \{a_1,\ldots,a_m\}$ and target PMF $P_A = (P_1,\ldots,P_m)$ , CCDM constructs blocklength- $n$ output sequences $x^n = (x_1, \ldots, x_n)$ so that the empirical distribution matches $P_A$ as closely as integer constraints permit. Composition vector $n = (n_1, \ldots, n_m)$ with $\sum n_i = n$ is chosen such that $n_i/n \approx P_i$ . The set of all sequences with this composition forms the type class: $T(n) = \{ x^n \in \mathcal{A}^n : \#\{j: x_j = a_i\} = n_i \; \forall i \}.$ The cardinality is given by the multinomial coefficient: $|T(n)| = \binom{n}{n_1, n_2, \ldots, n_m} = \frac{n!}{n_1! n_2! \cdots n_m!}.$ CCDM implements a bijection

$f_{\mathrm{CCDM}}: \{0,1\}^k \rightarrow T(n),$

where $k = \left\lfloor \log_2 |T(n)| \right\rfloor$ . The achieved rate is $R_{\mathrm{CCDM}} = k/n$ (Schulte et al., 2015, Fehenberger et al., 2018).

2. Asymptotic Rate Optimality and Finite-Length Loss

The optimal mapping would generate i.i.d. sequences according to $P_A$ , achieving per-symbol entropy rate $H(P_A) = -\sum_i P_i\log_2 P_i$ . CCDM, at finite $n$ , incurs a rate loss: $\Delta R = H(P_A) - R_{\mathrm{CCDM}},$ which vanishes as $n \to \infty$ and can be tightly bounded using Stirling’s formula: $\log_2 |T(n)| = nH(P_A) - \frac{m-1}{2} \log_2 n + O(1),$ yielding

$\Delta R \simeq \frac{m-1}{2n} \log_2 n + O(1/n)$

(Schulte et al., 2015, Fehenberger et al., 2018, Pikus et al., 2019). For binary CCDM, $m=2$ , and similar bounds hold with the type class size reduced to $\binom{n}{k}$ (Schulte et al., 2017).

The normalized informational divergence (per symbol) between the CCDM output and the i.i.d. target,

$\frac{1}{n} D(P_{\hat Y^n} \| P_Y^n) \approx \Delta R,$

also vanishes as $n$ increases. However, the absolute divergence scales as $(1/2)\log_2 n + O(1)$ for binary output (Schulte et al., 2017).

3. Enumerative and Arithmetic Coding Implementations

CCDM mapping is traditionally realized via enumerative coding or arithmetic coding. The arithmetic coding approach maintains a shrinking interval, at each step partitioned according to the remaining symbol counts. Encoding and decoding—picking symbols so that the output interval corresponds to the input bit-string’s fractional value, and vice versa—require only symbol counts and iterative updates of multinomial coefficients:

for pos in range(1, N+1):
    for i in range(1, M+1):
        if S[i] == 0: continue
        S[i] -= 1
        Wi = factorial(remaining_length-1) / prod(factorial(S[j]) for j in 1..M)
        if L < Wi:
            output x_pos = a_i
            W = Wi
            break
        else:
            L -= Wi
            S[i] += 1
    remaining_length -= 1

No precomputed codebook or large look-up tables are required; only factorial or log-factorial tables and small integer counters suffice (Fehenberger et al., 2018, Pikus et al., 2019).

Finite-precision arithmetic coding (FPA-CCDM) is analyzed rigorously: with $b$ bits of precision, the additional rate loss diminishes exponentially as $O(2^{-b})$ (Pikus et al., 2019). Multiplication-free variants such as Log-CCDM further reduce hardware costs by replacing multiplications/divisions with table lookups and integer addition, achieving negligible rate loss at practical blocklengths (Gültekin et al., 2022).

4. Complexity, Parallelization, and Alternative Algorithms

Arithmetic-coding-based CCDM is fundamentally serial: for input length $k$ and output length $n$ it requires $O(k + n)$ sequential steps, with per-symbol computational effort proportional to the alphabet size. Storage requirements are modest ( $O(\log n)$ bits for counters and interval endpoints), and overall memory and computational complexity is $O(nM)$ per block (Fehenberger et al., 2018, Gültekin et al., 2019).

Alternative algorithms include:

Multiset ranking (MR-CCDM) and subset ranking (SR-CCDM): These perform direct ranking/unranking on multisets for composition-aware mapping, reducing serial depth and facilitating hardware parallelization, especially in binary-output or parallel-amplitude architectures (Fehenberger et al., 2020, Fehenberger et al., 2019).
Parallel-amplitude CCDM (PA-DM): Decomposes $m$ -ary composition matching into $m-1$ parallel binary CCDMs with negligible rate penalty, dramatically reducing serial bottlenecks (Fehenberger et al., 2019).
Log-CCDM: Employs log-domain updates and lookup tables to avoid multiplications/divisions, achieving near-ideal performance and minimal rate loss with sub-kilobyte memory at $N\sim 10^3$ (Gültekin et al., 2022).

5. Performance at Finite Blocklength and Motivations for Generalizations

At small or moderate $n$ (e.g., $n<200$ ), CCDM's rate loss remains nontrivial, which directly translates into SNR loss and degradation in achievable information rates in PAS frameworks:

For $n=10$ , rate loss $\Delta R \sim 0.5$ bits/symbol.
Achieving $\Delta R \sim 0.025$ bits/symbol requires $n \gtrsim 600$ (Fehenberger et al., 2018, Gültekin et al., 2019).
Power loss in dB due to CCDM is $\approx 6\Delta R$ for PAS with AWGN (Gültekin et al., 2019).

Practical limitations thus include latency and strict sequentiality for long $n$ , motivating alternatives:

Multiset-Partition DM (MPDM) unifies multiple compositions, reducing rate loss by $30{-}40\%$ over CCDM at the same block length; for fixed SNR gap to capacity, MPDM achieves 2.5–5× block-length savings (Fehenberger et al., 2018, Gültekin et al., 2019).
Multi-composition DM (MCDM) incorporates several compositions within arithmetic coding, further closing the gap to the theoretical maximum for short $n$ (Pikus et al., 2019).
Sphere shaping (ESS/SM), which maximize energy efficiency rather than composition, can outperform CCDM in finite blocklength regimes (Gültekin et al., 2019).

6. Role in Probabilistic Amplitude Shaping and Practical Systems

CCDM is the standard distribution matcher in PAS frameworks for spectrally efficient coded modulation (e.g., 64-QAM, 256-QAM):

PAS architectures use CCDM to shape amplitude sequences with prescribed empirical PMFs, while FEC code supplies sign bits (Schulte et al., 2015, Fehenberger et al., 2018).
Achievable information rates with bit-metric decoding are reduced by CCDM’s rate loss; practical SNR gains of $>0.5$ dB are realizable versus uniform signaling at moderate blocklengths (Gültekin et al., 2019, Fehenberger et al., 2020).
CCDM can operate as a drop-in module; backward compatibility and low storage footprint facilitate adoption (Fehenberger et al., 2018, Fehenberger et al., 2020).
Specialized architectures, such as list-encoding CCDM with energy-dispersion-based candidate selection, have demonstrated measurable SNR and achievable rate gains in nonlinear optical systems (effective SNR improvement of 0.35 dB, AIR gain of 0.22 bit/4D-symbol, and 8% reach extension) (Wu et al., 2021).

7. Limits, Optimality, and Open Directions

CCDM achieves the entropy rate asymptotically, and normalized divergence to the target vanishes with increasing $n$ . However, for fixed-length, one-to-one DMs, it is proven that normalized divergence cannot be made zero at finite $n$ due to combinatorial cardinality constraints; leading-order divergence grows as $(1/2)\log_2 n$ (Schulte et al., 2017).

To approach the theoretical rate-divergence limits at short blocklengths, research now focuses on multi-composition schemes, sphere/shell shaping, and parallel architectures that address the inherent limitations of constant compositions (Fehenberger et al., 2018, Gültekin et al., 2019, Pikus et al., 2019).

Major open areas include further reducing computational latency, optimizing rate/divergence trade-offs for ultra-short packets, and integrating energy-dispersion-aware selection in non-linear channels without compromising PAS compatibility (Wu et al., 2021).