Papers
Topics
Authors
Recent
Search
2000 character limit reached

Tsallis α-negentropy: Theory & Applications

Updated 6 May 2026
  • Tsallis α-negentropy is defined as the negative of Tsallis α-entropy and generalizes classical entropic forms, forming the basis for key Bregman divergences.
  • Its strict convexity and explicit differential structure enable precise error bounds via generalized Pinsker inequalities with phase transitions based on α and dimensional effects.
  • Applications span robust statistics, online learning, and information geometry, where it links surrogate risk measures to total variation distance in probabilistic prediction.

Tsallis α-negentropy refers to the negative of the Tsallis α-entropy, a one-parameter family of information measures that generalize the Shannon–Gibbs entropy by interpolating between several classical entropic forms as α varies. On the open probability simplex, negative Tsallis entropies are canonically used to generate a family of strictly convex functions, whose Bregman divergences—denoted DαD_\alpha—serve as foundational objects in modern information-theoretic learning, online algorithms, and information geometry. These divergences include, as key special instances, the Kullback–Leibler divergence, reverse KL, Itakura–Saito divergence, and the so-called β-divergences prevalent in robust statistics and signal processing. A central research thread is the precise relationship between the α-negentropy-induced divergence and total variation distance, as formalized in sharp, dimension-aware generalized Pinsker inequalities (Beretta et al., 5 Feb 2026).

1. Definition of Tsallis α-entropies and α-negentropy

Let ΔK={p[0,1]K:i=1Kpi=1}\Delta^K = \{ p \in [0, 1]^K: \sum_{i=1}^K p_i = 1 \} denote the probability simplex with relative interior relintΔK={pΔK:pi>0i}\mathrm{relint}\, \Delta^K = \{ p \in \Delta^K : p_i > 0\, \forall i \}.

The Tsallis α-entropy is given by: Sα(p)={1α(1α)i=1Kpiα,α{0,1} i=1Klnpi,α=0 i=1Kpilnpi,α=1S_\alpha(p) = \begin{cases} \frac{1}{\alpha(1-\alpha)} \sum_{i=1}^K p_i^\alpha , & \alpha \notin \{0,1\} \ \sum_{i=1}^K \ln p_i, & \alpha = 0 \ -\sum_{i=1}^K p_i \ln p_i , & \alpha = 1 \end{cases}

Negative Tsallis α-entropy (α-negentropy), defined as fα(p)=Sα(p)f_\alpha(p) = -S_\alpha(p), is adopted as the convex generator in Bregman divergence constructions. Alternative normalizations often appear, such as: Hα(p)=i=1Kpiα11αH_\alpha(p) = \sum_{i=1}^K \frac{p_i^\alpha - 1}{1-\alpha} satisfying fα(p)=Hα(p)f_\alpha(p) = -H_\alpha(p) up to scale and additive constants, which do not affect the induced Bregman divergence.

Key classical limits:

  • α1\alpha \to 1: recovers Shannon entropy.
  • α=0\alpha = 0: coincides with Burg (log) entropy.

2. Convexity and Differential Structure

The α-negentropy fαf_\alpha is infinitely differentiable and strictly convex on ΔK={p[0,1]K:i=1Kpi=1}\Delta^K = \{ p \in [0, 1]^K: \sum_{i=1}^K p_i = 1 \}0 for all real α. The Hessian structure is explicit: ΔK={p[0,1]K:i=1Kpi=1}\Delta^K = \{ p \in [0, 1]^K: \sum_{i=1}^K p_i = 1 \}1 This positive definite form guarantees strict convexity and underlies the strong-convexity properties used in subsequent Pinsker-type lower bounds.

3. Bregman Divergences from α-negentropy

For ΔK={p[0,1]K:i=1Kpi=1}\Delta^K = \{ p \in [0, 1]^K: \sum_{i=1}^K p_i = 1 \}2,

ΔK={p[0,1]K:i=1Kpi=1}\Delta^K = \{ p \in [0, 1]^K: \sum_{i=1}^K p_i = 1 \}3

is the associated Bregman divergence. The properties of ΔK={p[0,1]K:i=1Kpi=1}\Delta^K = \{ p \in [0, 1]^K: \sum_{i=1}^K p_i = 1 \}4 imply that ΔK={p[0,1]K:i=1Kpi=1}\Delta^K = \{ p \in [0, 1]^K: \sum_{i=1}^K p_i = 1 \}5, with equality if and only if ΔK={p[0,1]K:i=1Kpi=1}\Delta^K = \{ p \in [0, 1]^K: \sum_{i=1}^K p_i = 1 \}6.

Special cases include:

  • ΔK={p[0,1]K:i=1Kpi=1}\Delta^K = \{ p \in [0, 1]^K: \sum_{i=1}^K p_i = 1 \}7: Kullback–Leibler divergence ΔK={p[0,1]K:i=1Kpi=1}\Delta^K = \{ p \in [0, 1]^K: \sum_{i=1}^K p_i = 1 \}8.
  • ΔK={p[0,1]K:i=1Kpi=1}\Delta^K = \{ p \in [0, 1]^K: \sum_{i=1}^K p_i = 1 \}9: Reverse KL / Itakura–Saito divergence.
  • relintΔK={pΔK:pi>0i}\mathrm{relint}\, \Delta^K = \{ p \in \Delta^K : p_i > 0\, \forall i \}0: Euclidean quadratic divergence.

relintΔK={pΔK:pi>0i}\mathrm{relint}\, \Delta^K = \{ p \in \Delta^K : p_i > 0\, \forall i \}1 also coincides with the β-divergence relintΔK={pΔK:pi>0i}\mathrm{relint}\, \Delta^K = \{ p \in \Delta^K : p_i > 0\, \forall i \}2 (with relintΔK={pΔK:pi>0i}\mathrm{relint}\, \Delta^K = \{ p \in \Delta^K : p_i > 0\, \forall i \}3) utilized in robust statistics and nonnegative matrix factorization.

4. Generalized Pinsker Inequality and Sharp Constants

A central result is the extension of Pinsker's inequality to the full Tsallis/Bregman family. For all relintΔK={pΔK:pi>0i}\mathrm{relint}\, \Delta^K = \{ p \in \Delta^K : p_i > 0\, \forall i \}4,

relintΔK={pΔK:pi>0i}\mathrm{relint}\, \Delta^K = \{ p \in \Delta^K : p_i > 0\, \forall i \}5

where the sharp optimal constant relintΔK={pΔK:pi>0i}\mathrm{relint}\, \Delta^K = \{ p \in \Delta^K : p_i > 0\, \forall i \}6 is given explicitly in a piecewise manner, exhibiting several phase transitions as α and K vary (Beretta et al., 5 Feb 2026).

Summary table of relintΔK={pΔK:pi>0i}\mathrm{relint}\, \Delta^K = \{ p \in \Delta^K : p_i > 0\, \forall i \}7:

Case relintΔK={pΔK:pi>0i}\mathrm{relint}\, \Delta^K = \{ p \in \Delta^K : p_i > 0\, \forall i \}8 Regime/Remarks
relintΔK={pΔK:pi>0i}\mathrm{relint}\, \Delta^K = \{ p \in \Delta^K : p_i > 0\, \forall i \}9 Sα(p)={1α(1α)i=1Kpiα,α{0,1} i=1Klnpi,α=0 i=1Kpilnpi,α=1S_\alpha(p) = \begin{cases} \frac{1}{\alpha(1-\alpha)} \sum_{i=1}^K p_i^\alpha , & \alpha \notin \{0,1\} \ \sum_{i=1}^K \ln p_i, & \alpha = 0 \ -\sum_{i=1}^K p_i \ln p_i , & \alpha = 1 \end{cases}0 Dimension-free
Sα(p)={1α(1α)i=1Kpiα,α{0,1} i=1Klnpi,α=0 i=1Kpilnpi,α=1S_\alpha(p) = \begin{cases} \frac{1}{\alpha(1-\alpha)} \sum_{i=1}^K p_i^\alpha , & \alpha \notin \{0,1\} \ \sum_{i=1}^K \ln p_i, & \alpha = 0 \ -\sum_{i=1}^K p_i \ln p_i , & \alpha = 1 \end{cases}1 Sα(p)={1α(1α)i=1Kpiα,α{0,1} i=1Klnpi,α=0 i=1Kpilnpi,α=1S_\alpha(p) = \begin{cases} \frac{1}{\alpha(1-\alpha)} \sum_{i=1}^K p_i^\alpha , & \alpha \notin \{0,1\} \ \sum_{i=1}^K \ln p_i, & \alpha = 0 \ -\sum_{i=1}^K p_i \ln p_i , & \alpha = 1 \end{cases}2 (even K), Sα(p)={1α(1α)i=1Kpiα,α{0,1} i=1Klnpi,α=0 i=1Kpilnpi,α=1S_\alpha(p) = \begin{cases} \frac{1}{\alpha(1-\alpha)} \sum_{i=1}^K p_i^\alpha , & \alpha \notin \{0,1\} \ \sum_{i=1}^K \ln p_i, & \alpha = 0 \ -\sum_{i=1}^K p_i \ln p_i , & \alpha = 1 \end{cases}3 (odd K) Power-law penalty, parity effect
Sα(p)={1α(1α)i=1Kpiα,α{0,1} i=1Klnpi,α=0 i=1Kpilnpi,α=1S_\alpha(p) = \begin{cases} \frac{1}{\alpha(1-\alpha)} \sum_{i=1}^K p_i^\alpha , & \alpha \notin \{0,1\} \ \sum_{i=1}^K \ln p_i, & \alpha = 0 \ -\sum_{i=1}^K p_i \ln p_i , & \alpha = 1 \end{cases}4 Sα(p)={1α(1α)i=1Kpiα,α{0,1} i=1Klnpi,α=0 i=1Kpilnpi,α=1S_\alpha(p) = \begin{cases} \frac{1}{\alpha(1-\alpha)} \sum_{i=1}^K p_i^\alpha , & \alpha \notin \{0,1\} \ \sum_{i=1}^K \ln p_i, & \alpha = 0 \ -\sum_{i=1}^K p_i \ln p_i , & \alpha = 1 \end{cases}5 Decay with α
Sα(p)={1α(1α)i=1Kpiα,α{0,1} i=1Klnpi,α=0 i=1Kpilnpi,α=1S_\alpha(p) = \begin{cases} \frac{1}{\alpha(1-\alpha)} \sum_{i=1}^K p_i^\alpha , & \alpha \notin \{0,1\} \ \sum_{i=1}^K \ln p_i, & \alpha = 0 \ -\sum_{i=1}^K p_i \ln p_i , & \alpha = 1 \end{cases}6 Sα(p)={1α(1α)i=1Kpiα,α{0,1} i=1Klnpi,α=0 i=1Kpilnpi,α=1S_\alpha(p) = \begin{cases} \frac{1}{\alpha(1-\alpha)} \sum_{i=1}^K p_i^\alpha , & \alpha \notin \{0,1\} \ \sum_{i=1}^K \ln p_i, & \alpha = 0 \ -\sum_{i=1}^K p_i \ln p_i , & \alpha = 1 \end{cases}7 No uniform lower bound

Here, Sα(p)={1α(1α)i=1Kpiα,α{0,1} i=1Klnpi,α=0 i=1Kpilnpi,α=1S_\alpha(p) = \begin{cases} \frac{1}{\alpha(1-\alpha)} \sum_{i=1}^K p_i^\alpha , & \alpha \notin \{0,1\} \ \sum_{i=1}^K \ln p_i, & \alpha = 0 \ -\sum_{i=1}^K p_i \ln p_i , & \alpha = 1 \end{cases}8 is a parity correction for odd K, with Sα(p)={1α(1α)i=1Kpiα,α{0,1} i=1Klnpi,α=0 i=1Kpilnpi,α=1S_\alpha(p) = \begin{cases} \frac{1}{\alpha(1-\alpha)} \sum_{i=1}^K p_i^\alpha , & \alpha \notin \{0,1\} \ \sum_{i=1}^K \ln p_i, & \alpha = 0 \ -\sum_{i=1}^K p_i \ln p_i , & \alpha = 1 \end{cases}9 for fα(p)=Sα(p)f_\alpha(p) = -S_\alpha(p)0.

Key phase transitions:

  • For fα(p)=Sα(p)f_\alpha(p) = -S_\alpha(p)1, the lower bound is dimension-independent.
  • For fα(p)=Sα(p)f_\alpha(p) = -S_\alpha(p)2, dimension and parity effects emerge.
  • For fα(p)=Sα(p)f_\alpha(p) = -S_\alpha(p)3, only the binary (K=2) case admits a nontrivial bound; in multiclass cases (K≥3), no uniform Pinsker-type lower bound holds.

Classical recoveries:

  • fα(p)=Sα(p)f_\alpha(p) = -S_\alpha(p)4: Recovers the Kullback–Leibler Pinsker bound (fα(p)=Sα(p)f_\alpha(p) = -S_\alpha(p)5).
  • fα(p)=Sα(p)f_\alpha(p) = -S_\alpha(p)6: Burg entropy and its divergence, constant fα(p)=Sα(p)f_\alpha(p) = -S_\alpha(p)7.
  • fα(p)=Sα(p)f_\alpha(p) = -S_\alpha(p)8: Euclidean case, fα(p)=Sα(p)f_\alpha(p) = -S_\alpha(p)9 (even K), Hα(p)=i=1Kpiα11αH_\alpha(p) = \sum_{i=1}^K \frac{p_i^\alpha - 1}{1-\alpha}0 (odd K).

5. Applications in Learning, Optimization, and Geometry

The excess risk of Tsallis (power) scoring rules coincides with Hα(p)=i=1Kpiα11αH_\alpha(p) = \sum_{i=1}^K \frac{p_i^\alpha - 1}{1-\alpha}1: Hα(p)=i=1Kpiα11αH_\alpha(p) = \sum_{i=1}^K \frac{p_i^\alpha - 1}{1-\alpha}2 so that

Hα(p)=i=1Kpiα11αH_\alpha(p) = \sum_{i=1}^K \frac{p_i^\alpha - 1}{1-\alpha}3

The generalized Pinsker inequality thus tightly relates surrogate excess risk to total variation distance, providing explicit control in settings of probabilistic prediction and classification.

In online learning and mirror descent, Hα(p)=i=1Kpiα11αH_\alpha(p) = \sum_{i=1}^K \frac{p_i^\alpha - 1}{1-\alpha}4 functions as a Tsallis regularizer. The sharp constant Hα(p)=i=1Kpiα11αH_\alpha(p) = \sum_{i=1}^K \frac{p_i^\alpha - 1}{1-\alpha}5 quantifies the Hα(p)=i=1Kpiα11αH_\alpha(p) = \sum_{i=1}^K \frac{p_i^\alpha - 1}{1-\alpha}6-strong convexity essential for regret bounds and step-size tuning. Uniform curvature is observed for Hα(p)=i=1Kpiα11αH_\alpha(p) = \sum_{i=1}^K \frac{p_i^\alpha - 1}{1-\alpha}7; dimension-dependent geometry appears when Hα(p)=i=1Kpiα11αH_\alpha(p) = \sum_{i=1}^K \frac{p_i^\alpha - 1}{1-\alpha}8.

From the standpoint of information geometry, the Bregman divergences induced by Tsallis α-negentropy interpolate between families of statistical manifolds, specifically traversing Burg, Shannon, and quadratic regimes as α passes specified thresholds. The constants Hα(p)=i=1Kpiα11αH_\alpha(p) = \sum_{i=1}^K \frac{p_i^\alpha - 1}{1-\alpha}9 capture the detailed fα(p)=Hα(p)f_\alpha(p) = -H_\alpha(p)0-curvature properties of these geometries, including distinct phase transitions at fα(p)=Hα(p)f_\alpha(p) = -H_\alpha(p)1.

6. Classical Limits and Special Cases

As α approaches canonical values, Tsallis α-negentropy and its induced Bregman divergences recover several classical divergences and inequalities:

  • fα(p)=Hα(p)f_\alpha(p) = -H_\alpha(p)2: Standard KL divergence, Pinsker's classical inequality (fα(p)=Hα(p)f_\alpha(p) = -H_\alpha(p)3).
  • fα(p)=Hα(p)f_\alpha(p) = -H_\alpha(p)4: Burg entropy and Itakura–Saito divergence (fα(p)=Hα(p)f_\alpha(p) = -H_\alpha(p)5), with a dimension-free constant.
  • fα(p)=Hα(p)f_\alpha(p) = -H_\alpha(p)6: Quadratic case (fα(p)=Hα(p)f_\alpha(p) = -H_\alpha(p)7), with sharp fα(p)=Hα(p)f_\alpha(p) = -H_\alpha(p)8-to-fα(p)=Hα(p)f_\alpha(p) = -H_\alpha(p)9 inequalities reflecting parity corrections in odd K.

7. Summary and Significance

The systematic extension of Pinsker’s inequality to the Tsallis/Bregman class, with exact, closed-form phase transitions in the sharp constant, clarifies the precise geometric and statistical roles of Tsallis α-negentropy in probabilistic prediction, online learning, and information geometry. Notably, the breakdown of uniform Pinsker-type lower bounds for α1\alpha \to 10 and multiclass settings delineates intrinsic limitations of the family. The results collectively provide a unified quantitative connection between Bregman divergence control and total variation error, applicable across inference, optimization, and statistical manifold analysis (Beretta et al., 5 Feb 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Tsallis α-negentropy.