Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 143 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 33 tok/s Pro
GPT-5 High 28 tok/s Pro
GPT-4o 117 tok/s Pro
Kimi K2 195 tok/s Pro
GPT OSS 120B 436 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Non-Asymptotic Excess Risk Bounds

Updated 28 October 2025
  • The paper extends traditional mutual information risk bounds by using generalized divergences (Rényi, α-Jensen-Shannon, Sibson) to derive non-asymptotic excess risk bounds.
  • It employs variational characterizations and data-dependent sub-Gaussian parameters to rigorously bound the degradation in prediction performance under noisy or compressed observations.
  • These bounds enhance flexibility and tightness, offering practical performance guarantees that outperform standard MI-based approaches in complex, high-variance settings.

Bounds on the Excess Minimum Risk via Generalized Information Divergence Measures

The theory of non-asymptotic excess minimum risk bounds quantifies the degradation in best achievable prediction performance when an estimator relies on a stochastically degraded feature vector (such as a compressed or noise-corrupted observation) rather than the full-information variable. Recent research has advanced these bounds using generalized information-theoretic divergences, extending previous approaches that depended solely on mutual information and assumed constant sub-Gaussianity. This article provides a comprehensive account of such generalized excess risk bounds, centering on Rényi, α-Jensen-Shannon, and Sibson mutual information, and compares these results with existing approaches, highlighting their advantages in flexibility, tightness, and generality.

1. Problem Setting and Excess Risk Formulation

Let random vectors YY, XX, and ZZ form a Markov chain (YXZY \to X \to Z), where YY is the target variable to be estimated from XX or ZZ. For a loss function ll, the excess minimum risk is

Ll(YZ)Ll(YX),L^*_l(Y|Z) - L^*_l(Y|X),

where Ll(YW)L^*_l(Y|W) denotes the minimum possible expected loss for predicting YY from WW. The central goal is to upper bound this excess, in terms of divergences between the conditional distributions PXY,ZP_{X|Y,Z} and PXZP_{X|Z}, while allowing general (possibly non-constant) sub-Gaussianity for the loss.

2. Non-Asymptotic Excess Risk Bounds via Generalized Divergences

2.1 Conditional Rényi Divergence Bound

Under the condition that for each yy, the function l(y,f(X))l(y, f(X)) (for optimal ff) is conditionally σ2(y)\sigma^2(y)-sub-Gaussian given ZZ, with $\E[\sigma^2(Y)] < \infty$, the main theorem establishes that for any α(0,1)\alpha \in (0,1),

$L^*_l(Y|Z) - L^*_l(Y|X) \leq \sqrt{\frac{2\E[\sigma^2(Y)]}{\alpha} D_\alpha (P_{X|Y,Z} \| P_{X|Z} \mid P_{Y,Z})},$

where DαD_\alpha is the conditional Rényi divergence of order α\alpha (Equation (LT1)).

For bounded loss, l\|l\|_\infty, the corollary gives

Ll(YZ)Ll(YX)l2αDα(PXY,ZPXZPY,Z).L^*_l(Y|Z) - L^*_l(Y|X) \leq \frac{ \|l\|_\infty \sqrt{2} }{ \sqrt{\alpha} } \sqrt{ D_\alpha (P_{X|Y,Z} \| P_{X|Z} \mid P_{Y,Z}) }.

As α1\alpha \rightarrow 1, these bounds recover mutual information–based inequalities such as those in Györfi et al. (2023): Ll(YZ)Ll(YX)2E[σ2(Y)]I(X;YZ).L^*_l(Y|Z) - L^*_l(Y|X) \leq \sqrt{ 2\, \mathbb{E}[\sigma^2(Y)] I(X;Y|Z) }.

2.2 Bounds via Conditional α–Jensen-Shannon and Sibson Information

For the α-Jensen-Shannon (JS) divergence, under sub-Gaussianity with respect to a convex mixture of PXZ,Y=yP_{X|Z,Y=y} and PXZP_{X|Z}, the excess risk is bounded by (Equation (LT1JS)): $L^*_l(Y|Z) - L^*_l(Y|X) \leq \sqrt{ \frac{2\E[\sigma^2(Y)]}{\alpha(1-\alpha)} JS_\alpha(P_{Y,X|Z}\,\|\,P_{Y|Z}P_{X|Z}\,|\,P_Z) }.$ Again, for bounded loss,

Ll(YZ)Ll(YX)l2α(1α)JSα(PY,XZPYZPXZPZ).L^*_l(Y|Z) - L^*_l(Y|X) \leq \frac{ \|l\|_\infty \sqrt{2} }{ \sqrt{ \alpha(1-\alpha) } } \sqrt{ JS_\alpha( P_{Y,X|Z} \| P_{Y|Z}P_{X|Z} | P_Z ) }.

In the limit α0\alpha\to0, the JS bound recovers the mutual information result; as α1\alpha\to1, it yields a bound involving the Lautum information (reverse KL).

Similarly, an excess risk bound is derived in terms of conditional Sibson mutual information: Ll(YZ)Ll(YX)2[(1α)σ2+αEPZ[ΦYZ(γ2(Y))]]αEPZ[IαS(PY,XZ,PYZ)].L^*_l(Y|Z) - L^*_l(Y|X) \leq \sqrt{ \frac{ 2 [ (1-\alpha) \sigma^2 + \alpha \mathbb{E}_{P_Z}[ \Phi_{Y^*|Z}( \gamma^2(Y^*) ) ] ] }{ \alpha } \, \mathbb{E}_{P_Z}[ I_\alpha^S( P_{Y,X|Z}, P_{Y^*|Z} ) ] }. All these bounds recover the mutual information-based upper bound as α1\alpha \to 1.

2.3 Construction and Proof Techniques

Derivations combine:

  • Risk representation via the difference of conditional (or joint vs. product) distributions.
  • Variational characterizations of divergences: Donsker-Varadhan for Rényi, analogous forms for JS and Sibson.
  • Conditional and possibly data-dependent sub-Gaussianity assumptions.
  • Auxiliary mixture distributions for JS and Sibson via techniques first systematized for generalization error in learning theory [Aminian et al., 2024].

3. Relation to Prior Work and Advantages

The generalized divergence framework advances earlier results by:

  • Removing the restriction that the sub-Gaussian parameter be a global constant (allowing, e.g., l(y,f(x))l(y, f(x)) to have σ2(y)\sigma^2(y) dependence).
  • Encompassing the entire canonical range (0,1)(0,1) of α, permitting tighter numerical optimization of the bound in applications.
  • Reducing to the standard mutual information bound for α1\alpha \to 1, thus strictly generalizing Györfi et al. (2023) (Omanwar et al., 30 May 2025), Modak et al. (2021), Aminian et al. (2024).
  • Providing strictly tighter bounds in challenging regimes, such as high cardinality discrete models and certain heavy-tailed continuous settings.

4. Application Examples and Comparison

Example 1: qq-ary Channel, Bounded Loss

For YXZY \to X \to Z where the channel is symmetric and ll is bounded, the α\alpha–JS bound is provably tighter than the mutual information bound for intermediate values of α\alpha, with more pronounced advantage as qq grows (see paper Figures 1–3).

Example 2: Additive Gaussian Model, Heterogeneous Sub-Gaussianity

With YN(0,1)Y \sim \mathcal{N}(0,1), X=Y+W1X = Y + W_1, Z=X+W2Z = X + W_2, and heavy-tailed or capped loss, the conditions for generalized bounds are satisfied, whereas constant-parameter MI-based bounds may be vacuous due to unbounded tails or lack of uniform control.

Example 3: Reverse Markov Chain

Swapping the roles of degraded variable and label, the method still provides sharper bounds than mutual information for suitable α\alpha.

Summary Table

Bound Type Excess Risk Bound Formulation (α∈(0,1)) Bounded Loss? Non-constant Sub-Gaussianity? Recovers MI as α→1 Often Tighter Than MI?
Rényi divergence $\sqrt{\frac{2\E[\sigma^2(Y)]}{\alpha} D_\alpha(\cdot)}$ Yes Yes Yes Yes, for some α
α–Jensen-Shannon $\sqrt{\frac{2\E[\sigma^2(Y)]}{\alpha(1-\alpha)} JS_\alpha(\cdot)}$ Yes Yes Yes Yes, for some α
Sibson information $\sqrt{ \frac{ 2((1-\alpha)\sigma^2 + \alpha \E[\Phi_{Y^*|Z}(\gamma^2(Y^*) ) ]) }{\alpha} \E[I_\alpha^S(\cdot)]}$ Yes Yes Yes Sometimes

5. Flexibility, Practicality, and Impact

  • Sub-Gaussian parameters may be data-dependent random variables, accommodating much broader classes of models (including unbounded YY or ll).
  • The parameter α\alpha serves as a trade-off dial, allowing practitioners to numerically optimize the upper bound for their scenario.
  • In practice, for large-alphabet channels or continuous models with non-constant noise, these generalized divergence bounds can offer substantially tighter excess risk guarantees, as shown via simulation in the paper.
  • No longer is the practitioner restricted by crude worst-case sub-Gaussian parameters that can trivialize classical mutual-information bounds in realistic high-variance settings.

6. Concluding Remarks

The generalized divergence approach to excess minimum risk delivers a unifying, parameterized family of non-asymptotic risk bounds. These strictly encompass and frequently improve upon the standard mutual information bounds by modulating both the divergence and the moment assumptions. The framework covers Rényi, Jensen-Shannon, and Sibson’s mutual informations, and is strictly more flexible and sharp, as confirmed by both theory and practice. It stands as a significant advance beyond lossless or MI-only generalization error analysis, with broad relevance to information theory, generalization in learning, and downstream inference under measurement or communication constraints.

References

  • Györfi, L., et al. "Excess Risk Bounds in Statistical Inference via Mutual Information," Entropy, 2023.
  • Modak, S., et al., ITW 2021.
  • Aminian, M., et al., JSAIT 2024 (Omanwar et al., 30 May 2025).
  • Donsker, M. D. and Varadhan, S. R. S., "Asymptotic evaluation of certain Markov process expectations for large time," 1975.
Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Non-Asymptotic Excess Risk Bounds.