Bernoulli f-Divergence Inequality

Updated 24 January 2026

Bernoulli f-Divergence Inequality is a framework defining sharp, explicit bounds linking f-divergences to total variation in Bernoulli distributions through convex generating functions.
Its methodology leverages reduction to two-point supports and precise extremal conditions, thereby generalizing classical results like Pinsker’s inequality to quantum contexts.
The inequality underpins applications in statistical decision making and information theory, offering actionable insights for hypothesis testing, risk minimization, and quantum divergence analysis.

The Bernoulli $f$ -divergence inequality provides sharp, explicit relations between various $f$ -divergences (of the Csiszár type) for Bernoulli distributions, frequently parameterized in terms of the total variation distance. These inequalities subsume and generalize classical results such as Pinsker’s, and form a kernel for both classical and quantum information theoretic bounds. The foundational results revolve around convexity properties of the generating function $f$ and leverage reduction arguments to two-point supports.

1. Definition and Principal Formulation

Let $f:(0,\infty)\to\mathbb{R}$ be convex with $f(1)=0$ . For probability measures $P\ll Q$ , the $f$ -divergence is defined by

$D_f(P\|Q) = \int_{q>0} f\left(\frac{p}{q}\right) dQ + f'(\infty) P\{ q=0 \}$

where $p=dP/d\lambda$ , $q=dQ/d\lambda$ under any dominating measure $\lambda$ . For Bernoulli distributions $P=\text{Bern}(p)$ , $Q=\text{Bern}(q)$ ,

$D_f(\text{Bern}(p)\|\text{Bern}(q)) = q\,f\left(\frac{p}{q}\right) + (1-q)\,f\left(\frac{1-p}{1-q}\right)$

(Guntuboyina et al., 2013, 0903.1765, Bongole et al., 17 Jan 2026, Lanier et al., 24 Jan 2025).

2. Sharp Lower Bounds via Total Variation

The central inequalities relate $D_f(\text{Bern}(p)\|\text{Bern}(q))$ to the total variation distance $\delta=|p-q|$ :

Bröcker’s monotonic lower bound (0903.1765):

$D_f(\text{Bern}(p)\|\text{Bern}(q)) \geq f(1+\delta/2) + f(1-\delta/2)$

This is tight for Bernoulli variables. The bounding function is strictly increasing in $\delta$ under mild regularity assumptions.

Sharp minimization via support reduction (Guntuboyina et al., 2013): For the minimum $D_f$ at fixed $|p-q|=V$ ,

$D_f(\text{Bern}(p)\|\text{Bern}(q)) \geq (1-V) f\left(\frac{1+V}{1-V}\right)$

attained when $p=(1+V)/2$ , $q=(1-V)/2$ , i.e., at symmetric pairs.

3. Best-Possible Generalized Pinsker Inequalities

The framework in (0906.1244) gives integral representations and tight “Pinsker-type” lower bounds for arbitrary $f$ in terms of total variation: $D_f(\text{Bern}(p)\|\text{Bern}(q)) \geq \Psi_f(\delta) := 2\left[\bar\Gamma_f\left(\frac{1}{2}-\frac{\delta}{2}\right) + \frac{\delta}{2} \Gamma_f\left(\frac{1}{2}\right) - \bar\Gamma_f\left(\frac{1}{2}\right) \right]$ where $\Gamma_f(\pi) = \int_0^\pi \gamma_f(t)\,dt$ , $\bar\Gamma_f(\pi) = \int_0^\pi \Gamma_f(t)\,dt$ , and $\gamma_f(\pi) = \frac{1}{\pi^3} f''(\frac{1-\pi}{\pi})$ for twice-differentiable $f$ .

The minimizing, or extremal, Bernoulli pairs for fixed $\delta$ have points at $p=(1+\delta)/2$ , $q=(1-\delta)/2$ .

4. Explicit Algebraic and Sandwich Inequalities

The “binary $f$ -divergence inequality” (Lanier et al., 24 Jan 2025, Sason, 2015) provides sharp algebraic sandwich bounds between any two Bernoulli $f$ -divergences, with formulas involving ratios and the $\chi^2$ divergence: $m D_f(P\|Q) \leq p\,f\left(\frac{p}{q}\right) + (1-p)\,f\left(\frac{1-p}{1-q}\right) - f\left(1 + \frac{(p-q)^2}{q(1-q)}\right) \leq M D_f(P\|Q)$ where

$m = \min\left\{ \frac{p}{q}, \frac{1-p}{1-q} \right\},\quad M = \max\left\{ \frac{p}{q}, \frac{1-p}{1-q} \right\}$

and the total variation and $\chi^2$ divergence are

$\delta = |p-q|,\qquad \chi^2(P,Q) = \frac{\delta^2}{q(1-q)}$

This inequality gives explicit control of the $f$ -divergence in terms of basic symmetric functions of $p$ and $q$ (Sason, 2015).

5. Optimality, Tightness, and Equality Conditions

The reductions above are maximally tight for Bernoulli laws. Tightness follows from the fact that the relevant functions (Bayes-risk curve, data processing contractions, etc.) achieve their extrema for binary distributions. Equality is attained precisely when $dP/dQ$ takes only two values and $f$ is affine over the critical support points involved in the inequalities.

Cases of equality in the sandwich bound occur only in degenerate cases (i.e., $p=q$ or $f$ affine) or for the aforementioned symmetric extremal pairs.

6. Instantiations and Special Cases

The Bernoulli $f$ -divergence inequalities specialize to classical divergences:

$f$ function	$D_f$ expression	Lower Bound Example
$t\log t$ (KL)	$q(p/q)\log(p/q) + (1-q)((1-p)/(1-q))\log((1-p)/(1-q))$	$(1-V)\ln\left(\frac{1+V}{1-V}\right)$
$(\sqrt t-1)^2/2$ (Hellinger)	$q(\sqrt{p/q}-1)^2/2 + (1-q)(\sqrt{(1-p)/(1-q)}-1)^2/2$	$1 - \sqrt{1-V^2}$
$(t-1)^2$ ( $\chi^2$ )	$q((p/q)^2 - 1) + (1-q)(((1-p)/(1-q))^2 - 1)$	$V^2 / (q(1-q))$

All these bounds encode sharp relationships that are maximally attained for the extremal Bernoulli pairs (Guntuboyina et al., 2013, Lanier et al., 24 Jan 2025, 0903.1765, 0906.1244).

7. Applications and Extensions

The Bernoulli $f$ -divergence inequality underpins several advanced methods:

Interactive statistical decision making: The reduction and inversion to two-sided intervals for monotone transforms of risk (e.g., for prior-predictive CVaR and quantile lower bounds) (Bongole et al., 17 Jan 2026).
Transfer to quantum divergences: The inequalities lift directly to quantum settings by reduction to classical analogues on two-point supports, sidestepping complex matrix analysis (Lanier et al., 24 Jan 2025).
Information-theoretic converse bounds: Generalization of Fano’s inequality and derivation of tight explicit bounds for loss probabilities, exponential moments, and tail risks.

The Bernoulli $f$ -divergence inequality is thus a foundational tool for optimally relating statistical divergences under minimal informativeness constraints, with broad implications for hypothesis testing, risk minimization, and quantum information theory.