Papers
Topics
Authors
Recent
2000 character limit reached

Reverse-Pinsker Type Inequalities

Updated 25 December 2025
  • Reverse-Pinsker type inequalities are a set of optimal upper bounds on f-divergences in terms of total variation distance, defined under specific constraints.
  • They characterize extremal distributions—often with three-point support—to derive sharp bounds that improve classical divergences like KL and Rényi measures.
  • These results have practical impact in statistical estimation, hypothesis testing, large deviations, and privacy amplification, providing refined metrics over traditional Pinsker bounds.

Reverse-Pinsker type inequalities provide the optimal or sharpest possible upper bounds on ff-divergences—including the Kullback–Leibler (KL), R\'enyi, χ2\chi^2, and Hellinger divergences—in terms of the total variation (TV) distance under specified constraints. While the classical Pinsker’s inequality gives a lower bound on KL divergence in terms of TV, the reverse direction is, in general, impossible without imposing additional restrictions on the probability measures of interest. The modern theory of reverse-Pinsker inequalities encompasses precise characterizations for a wide class of ff-divergences, parametric regimes, and structural constraints, with exact optimality results and explicit extremal distributions.

1. Definitions and Fundamental Limitations

For two probability measures PP and QQ defined on a common measurable space, the total variation distance is

TV(P,Q)=supAP(A)Q(A)=12EQdPdQ1.\mathrm{TV}(P,Q) = \sup_A |P(A) - Q(A)| = \frac12 \mathbb{E}_Q\left| \frac{dP}{dQ} - 1 \right|.

The ff-divergence associated to a convex generator f:[0,)(,]f: [0,\infty) \to (-\infty,\infty] with f(1)=0f(1)=0 is

Df(PQ)=EQ[f(dPdQ)].D_f(P\|Q) = \mathbb{E}_Q\left[ f\left( \frac{dP}{dQ} \right) \right].

Special cases include Kullback–Leibler (f(t)=tlntf(t)=t\ln t), R\'enyi (α1\alpha \ne 1, fα(t)=(tα1)/(α1)f_\alpha(t) = (t^\alpha -1)/(\alpha-1)), and squared Hellinger (f(t)=(t1)2f(t)=(\sqrt{t}-1)^2) divergences.

While Pinsker’s inequality guarantees that small KL divergence enforces small TV (D(PQ)2TV(P,Q)2D(P\|Q) \geq 2\,\mathrm{TV}(P,Q)^2), no universal upper bound on Df(PQ)D_f(P\|Q) in terms of TV holds for arbitrary P,QP,Q, since for fixed ε>0\varepsilon>0 one can find P,QP,Q such that TV(P,Q)=ε\mathrm{TV}(P,Q)=\varepsilon but D(PQ)D(P\|Q) is arbitrarily large unless further constraints—typically on the Radon–Nikodym derivative or the underlying space—are imposed (Sason, 2015).

2. Sharp Reverse-Pinsker Inequalities: General Theory and Main Results

Let $m=\essinf_x dP/dQ(x)$ and $M=\esssup_x dP/dQ(x)$. For fixed TV distance δ\delta, the extremal set

$\mathcal{A}(\delta, m, M) = \left\{ (P, Q): P\ll Q,\, \essinf dP/dQ = m,\, \esssup dP/dQ = M,\, \mathrm{TV}(P,Q)=\delta \right\}$

encloses all pairs (P,Q)(P,Q) with fixed mass-ratio range and prescribed TV.

Binette’s theorem (Binette, 2018): For convex ff, if 0m<1<M<0 \leq m < 1 < M < \infty and A(δ,m,M)\mathcal{A}(\delta,m,M) \neq \emptyset,

sup(P,Q)A(δ,m,M)Df(PQ)=δ(f(m)1m+f(M)M1).\sup_{(P,Q)\in\mathcal A(\delta,m,M)} D_f(P\|Q) = \delta\left( \frac{f(m)}{1-m} + \frac{f(M)}{M-1} \right).

The supremum is achieved by a three-point distribution with mass ratios exactly mm and MM. For f(t)=tlntf(t)=t\ln t, this yields explicit KL upper bounds in terms of TV, mm, and MM.

For a given δ\delta with unconstrained mm and MM,

sup(P,Q)C(δ)Df(PQ)=δ(f(0)+limMf(M)M).\sup_{(P,Q)\in\mathcal{C}(\delta)} D_f(P\|Q) = \delta \left( f(0) + \lim_{M\to\infty}\frac{f(M)}{M} \right).

This recovers classical results (Vajda) for the limiting range of ff-divergences at given TV.

The method applies to all ff-divergences and provides tight constants, with extremizers always supported on three points.

3. Reverse Bounds for KL and R\'enyi Divergences: Refinements and Extensions

For general measures with constraints on the likelihood ratios, Sason (Sason, 2015) proves the following refinement of Verdú’s earlier bound: D(PQ)12(log11B11B1B2loge)PQ1,D(P\|Q) \leq \frac12\left( \frac{\log\frac{1}{1-B_1}}{1-B_1} - B_2\log e \right)\|P - Q\|_1, where B1=supadP/dQ(a)B_1 = \sup_{a} dP/dQ(a) and B2=infadP/dQ(a)B_2 = \inf_{a} dP/dQ(a). This bound recovers and strictly improves Verdú’s result in the “bounded likelihood-ratio” regime.

On a finite alphabet, for Qmin=minaQ(a)Q_{\min} = \min_a Q(a) and β2=minaP(a)/Q(a)\beta_2 = \min_a P(a)/Q(a),

D(PQ)log(1+PQ122Qmin)β2loge2PQ12,D(P\Vert Q) \leq \log\Big(1 + \frac{\|P-Q\|_1^2}{2Q_{\min}}\Big) - \frac{\beta_2\log e}{2}\|P-Q\|_1^2,

an improvement over the Csiszár–Talata bound, with a halved quadratic prefactor and further corrections (Sason, 2015).

Reverse Pinsker bounds for R\'enyi divergences of arbitrary order α\alpha are also established: $D_\alpha(P\|Q)\leq \begin{cases} \log\Big(1+\frac{\varepsilon}{Q_{\min}}\Big), & \alpha>2,\[1ex] \log\Big(1+\frac{\varepsilon^2}{2Q_{\min}}\Big), & 1\le\alpha\le2,\[1ex] \frac{\alpha}{1-\alpha}\,\log\Big(1+\frac{\varepsilon^2}{2P_{\min}}\Big), & 0<\alpha<1,\[1ex] -2\log(1-\tfrac{\varepsilon}{2}), & \alpha=1/2, \end{cases}$ with ε=PQ1\varepsilon=\|P-Q\|_1 (Sason, 2015). Variants using the joint range framework and tightness analysis appear in (Grosse et al., 20 Jan 2025).

4. Reverse Pinsker Inequalities via Variational and Bayes Risk Methods

Reid and Williamson (0906.1244) develop the most general form, establishing that for any ff-divergence and any finite set of “primitive” ff-divergences (generalized variational divergences) Vπi(P,Q)V_{\pi_i}(P,Q) at points πi(0,1)\pi_i\in(0,1),

If(PQ)minaAni=0nπˉiπˉi+1(αa,iπ+βa,i)γf(π)dπ,I_f(P\|Q)\ge \min_{a\in A_n} \sum_{i=0}^n \int_{\bar\pi_i}^{\bar\pi_{i+1}} \left( \alpha_{a,i}\pi + \beta_{a,i} \right)\gamma_f(\pi)\, d\pi,

where γf(π)=1π3f(1ππ)\gamma_f(\pi)=\frac{1}{\pi^3}f''\big(\frac{1-\pi}{\pi}\big), and (αa,i,βa,i)(\alpha_{a,i},\beta_{a,i}) are determined by the slopes and the sample values of Vπi(P,Q)V_{\pi_i}(P,Q). For KL divergence and TV (n=1n=1), the bound specializes to a one-dimensional minimization that provides the tightest possible functional relationship between KL and TV.

This synthesizes and sharpens all prior (Vajda, Fedotov–Topsøe, Pinsker, Bretagnolle–Huber) inequalities, confirming optimality via Bayes risk duality and tightness on connected measurable spaces.

5. Reverse Pinsker Inequalities for Small and Large Deviations: Asymptotics and Applications

When PP and QQ have bounded likelihood ratios or are supported on finite alphabets, and ε=PQ11\varepsilon = \|P-Q\|_1 \ll 1, all sharp reverse Pinsker bounds agree in the small deviation regime: D(PQ)C(Q)ε2,with C(Q)1/2,D(P\|Q) \asymp C(Q)\,\varepsilon^2,\quad \text{with } C(Q)\geq 1/2, with C(Q)C(Q) attaining $1/2$ in symmetric (“balanced”) cases, as shown notably in (Berend et al., 2012). For “balanced” QQ (min(Q(A),1Q(A))=1/2\min(Q(A),1-Q(A)) = 1/2), the bound matches Pinsker’s constant.

In testing, large deviations, and concentration inequalities, reverse Pinsker bounds provide the small-ball exponents under TV constraints, and thus precisely capture the rate exponents in Sanov’s theorem for empirical measures deviating by at least vv in TV (Berend et al., 2012).

6. Special Cases: Binary Support, χ2\chi^2-Divergence, and Information Density Bounds

For binary observations, the tight bounds reduce to explicit expressions: D(v,Q)=KL2(q0v/2,q0),Q=(q0,1q0),D^*(v,Q) = \mathrm{KL}_2(q_0 - v/2, q_0), \quad Q = (q_0, 1-q_0), where D(v,Q)D^*(v,Q) is the infimum of D(PQ)D(P\|Q) for PP at TV distance vv from QQ; expansions in vv yield the precise asymptotics.

Analogous results hold for χ2\chi^2-divergence and symmetric χ2\chi^2; e.g.,

χ2(PQ){V2,V<1, V/(2V),V1,\chi^2(P\|Q) \geq \begin{cases} V^2, & V<1,\ {V}/{(2-V)},& V\ge1, \end{cases}

with V=PQ1V=\|P-Q\|_1 (0906.1244). For information density, Sason (Sason, 2015) exhibits optimal lower bounds on TV in terms of the distribution of ıPQ\imath_{P\|Q}, with binary extremizers.

7. Summary Table: Sharp Reverse Pinsker Bounds in Key Settings

Setting Reverse Pinsker Upper Bound Reference
General, bounded likelihood D(PQ)c(B1,B2)PQ1D(P\|Q) \leq c(B_1,B_2) \|P-Q\|_1 (Sason, 2015)
Finite alphabet, Qmin>0Q_{\min}>0 D(PQ)log(1+PQ122Qmin)β2loge2PQ12D(P\|Q)\le\log(1+\frac{\|P-Q\|_1^2}{2Q_{\min}})-\frac{\beta_2\log e}{2}\|P-Q\|_1^2 (Sason, 2015)
General, ff-divergence, m,Mm,M Df(PQ)δ(f(m)1m+f(M)M1)D_f(P\|Q) \leq \delta \left( \frac{f(m)}{1-m} + \frac{f(M)}{M-1} \right) (Binette, 2018)
KL vs TV, any V[0,2)V\in[0,2) Minimization form in β\beta: minβ[V2,2V]Φ(β;V)\min_{\beta\in[V-2,2-V]}\Phi(\beta;V) (0906.1244)
R\'enyi, α>1\alpha>1, finite sup Dα(PQ)1α1ln(1+PQTVRα(Γmax,Γmin))D_\alpha(P\|Q)\le \frac{1}{\alpha-1} \ln(1+\|P-Q\|_{TV}\cdot R_\alpha(\Gamma_{\max},\Gamma_{\min})) (Grosse et al., 20 Jan 2025)

8. Impact and Applications

These optimal bounds under structured constraints unify and resolve longstanding questions concerning the achievable relationships between ff-divergences and variational divergences. In statistical estimation, information theory, large deviations, robust statistics, and privacy amplification (e.g., R\'enyi local differential privacy), reverse-Pinsker type inequalities enable precise quantification of the trade-offs between distinguishability and distance metrics (Grosse et al., 20 Jan 2025, Sason, 2015).

Recent extensions exploit convexity, duality, and variational characterizations, allowing the translation of these inequalities to settings involving conditional distributions, post-processing through channels, and the characterization of strong data-processing inequalities for R\'enyi divergences (Grosse et al., 20 Jan 2025). The reverse-Pinsker arsenal is thus central to both foundational theoretical developments and practical bounds in high-dimensional inference, privacy, and concentration of measure.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Reverse-Pinsker Type Inequalities.