Rejection Sampling is Optimal for Relative Entropy Coding

Published 25 Apr 2026 in cs.IT | (2604.23076v1)

Abstract: In relative entropy coding, a sender aims to design a stochastic code such that, on input $X \sim P_X$, the receiver can generate a sample $Y \sim P_{Y \mid X}$. It is a standard result that (1) this requires at least $I(X; Y)$ bits, (2) the lower bound is achievable within a logarithmic gap, and (3) this gap cannot be reduced in general. The necessity of the gap suggests that the mutual information is not the correct information measure to quantify the rate of relative entropy coding. A potential alternative emerged in the work of Flamich et al. (2025), who proved a tighter lower bound of $I_F(X \to Y)$, a quantity we call the functional information. In this paper, we show that this lower bound is tight by constructing the ring toss code, an encoding method for rejection sampling which uses at most $I_F(X \to Y) + \log e$ bits. This demonstrates that rejection sampling is optimal for relative entropy coding. Our result implies that the classical mutual information lower bound is achievable within $\log(I(X; Y) + 1) + 2.45$ bits in general and within $1.45$ bits for singular channels, which are both the tightest bounds of their kind to date. Moreover, our one-shot result also recovers Sriramu and Wagner's asymptotic results on the second-order redundancy of relative entropy codes.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper introduces the ring toss code, which adapts rejection sampling to reach the lower bound in one-shot relative entropy coding.
It proves that under bounded density conditions, the expected codeword length is F(X;Y) + log e bits, effectively closing the gap with mutual information bounds.
The work establishes both non-asymptotic and asymptotic tight redundancy bounds, highlighting significant implications for stochastic channel simulation and data compression.

Optimality of Rejection Sampling for Relative Entropy Coding

Overview

This paper formalizes and resolves a longstanding question in coding theory and information theory related to the simulation of stochastic channels, specifically the fundamental rate limits of relative entropy coding. It advances the theory by demonstrating that classical rejection sampling, when coupled with their novel “ring toss code,” is optimal for single-use (one-shot) channel simulation in terms of a newly characterized measure, the “functional information.” Key contributions include closing the gap between classical mutual information bounds and what is fundamentally achievable, characterizing conditions in which redundancy vanishes or remains constant, and providing the tightest non-asymptotic and asymptotic bounds to date.

Relative Entropy Coding and Rate Measures

Relative entropy coding addresses the problem of simulating a conditional distribution $P_{Y|X}$ , given a source $X \sim P_X$ , via a code that allows a receiver to generate a correct $Y$ with potentially shared randomness. The classical lower bound for the expected codeword length is given by the mutual information $I(X;Y)$ . Poisson functional representations achieve this bound up to an additive logarithmic term, and prior work has shown that this gap is unavoidable for general channels.

Flamich et al. [flamich2025redundancy] introduced a refined lower bound termed the functional information $F(X;Y)$ , based on the channel simulation divergence, which is always at least as tight as the mutual information lower bound but was previously unachievable except for special cases.

Ring Toss Code and Main Result

The authors introduce the ring toss code, an encoding method adapted from rejection sampling, which utilizes an alternative encoding of the sample index to achieve an expected rate of $F(X;Y) + \log e$ bits for all channels satisfying a bounded density ratio condition. This result establishes that rejection sampling is rate-optimal for relative entropy coding in the sense of achieving the exact lower bound on expected codeword length up to an additive constant.

The central theorem states that:

For random variables $(X, Y) \sim P_{X,Y}$ and proposal distribution $P_Y$ , with $dP_{Y|X}/dP_Y$ bounded by $M$ , the ring toss code achieves $X \sim P_X$ 0 for common randomness $X \sim P_X$ 1.
For singular channels (where $X \sim P_X$ 2 is a function only of $X \sim P_X$ 3 almost everywhere), $X \sim P_X$ 4, so constant redundancy is achieved over the mutual information bound.

Notably, the ring toss code does not rely on Poisson or greedy Poisson representations, but rather directly encodes the conditional distribution derived from the shared randomness and proposal samples.

Implications and Tight Numerical Bounds

The established rate immediately recovers and strengthens previously known asymptotic results:

The classical lower bound gap of $X \sim P_X$ 5 bits is improved to $X \sim P_X$ 6 bits for general channels, and $X \sim P_X$ 7 bits (i.e., $X \sim P_X$ 8) for singular channels.
For blocklength $X \sim P_X$ 9, the asymptotic redundancy per symbol is characterized, with the logarithmic gap coefficient being $Y$ 0 for singular channels and $Y$ 1 for nonsingular channels.

These provide the tightest known achievable bounds for both one-shot and asymptotic regimes and refine the connection between entropy coding rates, mutual information, and the newly formalized functional information.

Structural and Theoretical Insights

The paper identifies the channel simulation divergence (also known as the functional information divergence) as a fundamental quantity for measuring the cost of simulating stochastic channels in the presence of side or shared randomness. The width function, which underpins this divergence, links the geometry of probability densities with achievable code rates.

For singular channels, which include classical examples such as the binary erasure and continuous uniform additive noise channels, the ring toss code demonstrates constant redundancy. Lemma III.1 provides a full characterization of when redundancy vanishes ( $Y$ 2) in terms of channel singularity.

Importantly, the ring toss code’s search-based interpretation—searching over $Y$ 3 instead of $Y$ 4—distinguishes it algorithmically from existing entropy codes and elucidates the relationship between the code structure and the underlying stochastic process.

Future Directions

Practical and theoretical extensions suggested include:

Removing or relaxing the boundedness condition on the density ratio $Y$ 5 to handle unbounded or heavy-tailed distributions.
Determining whether there exist broader channel families, beyond singular channels, for which $Y$ 6 is sub-logarithmic or negligible.
Developing efficient computational methods for the exact evaluation or approximation of $Y$ 7 in high-dimensional or structured probabilistic settings.

Given its relation to channel simulation, lossless and lossy compression, and the strong functional representation lemma, this work lays groundwork for further advances in the optimality of stochastic codes across data compression, machine learning, and information theory.

Conclusion

This paper provides a fundamental advance in channel simulation and coding theory, proving that optimized rejection sampling—via the ring toss code—achieves the best possible rate for relative entropy coding in terms of the newly established functional information. The work resolves an important open problem, delivers the tightest non-asymptotic and asymptotic bounds for channel simulation redundancy, and sets a new theoretical standard for both the analysis and practical construction of entropy codes based on stochastic channel simulation (2604.23076).

Markdown Report Issue