Reverse-Pinsker Type Inequalities
- Reverse-Pinsker type inequalities are a set of optimal upper bounds on f-divergences in terms of total variation distance, defined under specific constraints.
- They characterize extremal distributions—often with three-point support—to derive sharp bounds that improve classical divergences like KL and Rényi measures.
- These results have practical impact in statistical estimation, hypothesis testing, large deviations, and privacy amplification, providing refined metrics over traditional Pinsker bounds.
Reverse-Pinsker type inequalities provide the optimal or sharpest possible upper bounds on -divergences—including the Kullback–Leibler (KL), R\'enyi, , and Hellinger divergences—in terms of the total variation (TV) distance under specified constraints. While the classical Pinsker’s inequality gives a lower bound on KL divergence in terms of TV, the reverse direction is, in general, impossible without imposing additional restrictions on the probability measures of interest. The modern theory of reverse-Pinsker inequalities encompasses precise characterizations for a wide class of -divergences, parametric regimes, and structural constraints, with exact optimality results and explicit extremal distributions.
1. Definitions and Fundamental Limitations
For two probability measures and defined on a common measurable space, the total variation distance is
The -divergence associated to a convex generator with is
Special cases include Kullback–Leibler (), R\'enyi (, ), and squared Hellinger () divergences.
While Pinsker’s inequality guarantees that small KL divergence enforces small TV (), no universal upper bound on in terms of TV holds for arbitrary , since for fixed one can find such that but is arbitrarily large unless further constraints—typically on the Radon–Nikodym derivative or the underlying space—are imposed (Sason, 2015).
2. Sharp Reverse-Pinsker Inequalities: General Theory and Main Results
Let $m=\essinf_x dP/dQ(x)$ and $M=\esssup_x dP/dQ(x)$. For fixed TV distance , the extremal set
$\mathcal{A}(\delta, m, M) = \left\{ (P, Q): P\ll Q,\, \essinf dP/dQ = m,\, \esssup dP/dQ = M,\, \mathrm{TV}(P,Q)=\delta \right\}$
encloses all pairs with fixed mass-ratio range and prescribed TV.
Binette’s theorem (Binette, 2018): For convex , if and ,
The supremum is achieved by a three-point distribution with mass ratios exactly and . For , this yields explicit KL upper bounds in terms of TV, , and .
For a given with unconstrained and ,
This recovers classical results (Vajda) for the limiting range of -divergences at given TV.
The method applies to all -divergences and provides tight constants, with extremizers always supported on three points.
3. Reverse Bounds for KL and R\'enyi Divergences: Refinements and Extensions
For general measures with constraints on the likelihood ratios, Sason (Sason, 2015) proves the following refinement of Verdú’s earlier bound: where and . This bound recovers and strictly improves Verdú’s result in the “bounded likelihood-ratio” regime.
On a finite alphabet, for and ,
an improvement over the Csiszár–Talata bound, with a halved quadratic prefactor and further corrections (Sason, 2015).
Reverse Pinsker bounds for R\'enyi divergences of arbitrary order are also established: $D_\alpha(P\|Q)\leq \begin{cases} \log\Big(1+\frac{\varepsilon}{Q_{\min}}\Big), & \alpha>2,\[1ex] \log\Big(1+\frac{\varepsilon^2}{2Q_{\min}}\Big), & 1\le\alpha\le2,\[1ex] \frac{\alpha}{1-\alpha}\,\log\Big(1+\frac{\varepsilon^2}{2P_{\min}}\Big), & 0<\alpha<1,\[1ex] -2\log(1-\tfrac{\varepsilon}{2}), & \alpha=1/2, \end{cases}$ with (Sason, 2015). Variants using the joint range framework and tightness analysis appear in (Grosse et al., 20 Jan 2025).
4. Reverse Pinsker Inequalities via Variational and Bayes Risk Methods
Reid and Williamson (0906.1244) develop the most general form, establishing that for any -divergence and any finite set of “primitive” -divergences (generalized variational divergences) at points ,
where , and are determined by the slopes and the sample values of . For KL divergence and TV (), the bound specializes to a one-dimensional minimization that provides the tightest possible functional relationship between KL and TV.
This synthesizes and sharpens all prior (Vajda, Fedotov–Topsøe, Pinsker, Bretagnolle–Huber) inequalities, confirming optimality via Bayes risk duality and tightness on connected measurable spaces.
5. Reverse Pinsker Inequalities for Small and Large Deviations: Asymptotics and Applications
When and have bounded likelihood ratios or are supported on finite alphabets, and , all sharp reverse Pinsker bounds agree in the small deviation regime: with attaining $1/2$ in symmetric (“balanced”) cases, as shown notably in (Berend et al., 2012). For “balanced” (), the bound matches Pinsker’s constant.
In testing, large deviations, and concentration inequalities, reverse Pinsker bounds provide the small-ball exponents under TV constraints, and thus precisely capture the rate exponents in Sanov’s theorem for empirical measures deviating by at least in TV (Berend et al., 2012).
6. Special Cases: Binary Support, -Divergence, and Information Density Bounds
For binary observations, the tight bounds reduce to explicit expressions: where is the infimum of for at TV distance from ; expansions in yield the precise asymptotics.
Analogous results hold for -divergence and symmetric ; e.g.,
with (0906.1244). For information density, Sason (Sason, 2015) exhibits optimal lower bounds on TV in terms of the distribution of , with binary extremizers.
7. Summary Table: Sharp Reverse Pinsker Bounds in Key Settings
| Setting | Reverse Pinsker Upper Bound | Reference |
|---|---|---|
| General, bounded likelihood | (Sason, 2015) | |
| Finite alphabet, | (Sason, 2015) | |
| General, -divergence, | (Binette, 2018) | |
| KL vs TV, any | Minimization form in : | (0906.1244) |
| R\'enyi, , finite sup | (Grosse et al., 20 Jan 2025) |
8. Impact and Applications
These optimal bounds under structured constraints unify and resolve longstanding questions concerning the achievable relationships between -divergences and variational divergences. In statistical estimation, information theory, large deviations, robust statistics, and privacy amplification (e.g., R\'enyi local differential privacy), reverse-Pinsker type inequalities enable precise quantification of the trade-offs between distinguishability and distance metrics (Grosse et al., 20 Jan 2025, Sason, 2015).
Recent extensions exploit convexity, duality, and variational characterizations, allowing the translation of these inequalities to settings involving conditional distributions, post-processing through channels, and the characterization of strong data-processing inequalities for R\'enyi divergences (Grosse et al., 20 Jan 2025). The reverse-Pinsker arsenal is thus central to both foundational theoretical developments and practical bounds in high-dimensional inference, privacy, and concentration of measure.