Tsallis α-negentropy is defined as the negative of Tsallis α-entropy and generalizes classical entropic forms, forming the basis for key Bregman divergences.
Its strict convexity and explicit differential structure enable precise error bounds via generalized Pinsker inequalities with phase transitions based on α and dimensional effects.
Applications span robust statistics, online learning, and information geometry, where it links surrogate risk measures to total variation distance in probabilistic prediction.
Tsallis α-negentropy refers to the negative of the Tsallis α-entropy, a one-parameter family of information measures that generalize the Shannon–Gibbs entropy by interpolating between several classical entropic forms as α varies. On the open probability simplex, negative Tsallis entropies are canonically used to generate a family of strictly convex functions, whose Bregman divergences—denoted Dα—serve as foundational objects in modern information-theoretic learning, online algorithms, and information geometry. These divergences include, as key special instances, the Kullback–Leibler divergence, reverse KL, Itakura–Saito divergence, and the so-called β-divergences prevalent in robust statistics and signal processing. A central research thread is the precise relationship between the α-negentropy-induced divergence and total variation distance, as formalized in sharp, dimension-aware generalized Pinsker inequalities (Beretta et al., 5 Feb 2026).
1. Definition of Tsallis α-entropies and α-negentropy
Let ΔK={p∈[0,1]K:i=1∑Kpi=1} denote the probability simplex with relative interior relintΔK={p∈ΔK:pi>0∀i}.
The Tsallis α-entropy is given by: Sα(p)={α(1−α)1∑i=1Kpiα,α∈/{0,1}∑i=1Klnpi,α=0−∑i=1Kpilnpi,α=1
Negative Tsallis α-entropy (α-negentropy), defined as fα(p)=−Sα(p), is adopted as the convex generator in Bregman divergence constructions. Alternative normalizations often appear, such as: Hα(p)=i=1∑K1−αpiα−1
satisfying fα(p)=−Hα(p) up to scale and additive constants, which do not affect the induced Bregman divergence.
The α-negentropy fα is infinitely differentiable and strictly convex on ΔK={p∈[0,1]K:i=1∑Kpi=1}0 for all real α. The Hessian structure is explicit: ΔK={p∈[0,1]K:i=1∑Kpi=1}1
This positive definite form guarantees strict convexity and underlies the strong-convexity properties used in subsequent Pinsker-type lower bounds.
3. Bregman Divergences from α-negentropy
For ΔK={p∈[0,1]K:i=1∑Kpi=1}2,
ΔK={p∈[0,1]K:i=1∑Kpi=1}3
is the associated Bregman divergence. The properties of ΔK={p∈[0,1]K:i=1∑Kpi=1}4 imply that ΔK={p∈[0,1]K:i=1∑Kpi=1}5, with equality if and only if ΔK={p∈[0,1]K:i=1∑Kpi=1}6.
relintΔK={p∈ΔK:pi>0∀i}1 also coincides with the β-divergence relintΔK={p∈ΔK:pi>0∀i}2 (with relintΔK={p∈ΔK:pi>0∀i}3) utilized in robust statistics and nonnegative matrix factorization.
4. Generalized Pinsker Inequality and Sharp Constants
A central result is the extension of Pinsker's inequality to the full Tsallis/Bregman family. For all relintΔK={p∈ΔK:pi>0∀i}4,
relintΔK={p∈ΔK:pi>0∀i}5
where the sharp optimal constant relintΔK={p∈ΔK:pi>0∀i}6 is given explicitly in a piecewise manner, exhibiting several phase transitions as α and K vary (Beretta et al., 5 Feb 2026).
Here, Sα(p)={α(1−α)1∑i=1Kpiα,α∈/{0,1}∑i=1Klnpi,α=0−∑i=1Kpilnpi,α=18 is a parity correction for odd K, with Sα(p)={α(1−α)1∑i=1Kpiα,α∈/{0,1}∑i=1Klnpi,α=0−∑i=1Kpilnpi,α=19 for fα(p)=−Sα(p)0.
Key phase transitions:
For fα(p)=−Sα(p)1, the lower bound is dimension-independent.
For fα(p)=−Sα(p)2, dimension and parity effects emerge.
For fα(p)=−Sα(p)3, only the binary (K=2) case admits a nontrivial bound; in multiclass cases (K≥3), no uniform Pinsker-type lower bound holds.
Classical recoveries:
fα(p)=−Sα(p)4: Recovers the Kullback–Leibler Pinsker bound (fα(p)=−Sα(p)5).
fα(p)=−Sα(p)6: Burg entropy and its divergence, constant fα(p)=−Sα(p)7.
5. Applications in Learning, Optimization, and Geometry
The excess risk of Tsallis (power) scoring rules coincides with Hα(p)=i=1∑K1−αpiα−11: Hα(p)=i=1∑K1−αpiα−12
so that
Hα(p)=i=1∑K1−αpiα−13
The generalized Pinsker inequality thus tightly relates surrogate excess risk to total variation distance, providing explicit control in settings of probabilistic prediction and classification.
In online learning and mirror descent, Hα(p)=i=1∑K1−αpiα−14 functions as a Tsallis regularizer. The sharp constant Hα(p)=i=1∑K1−αpiα−15 quantifies the Hα(p)=i=1∑K1−αpiα−16-strong convexity essential for regret bounds and step-size tuning. Uniform curvature is observed for Hα(p)=i=1∑K1−αpiα−17; dimension-dependent geometry appears when Hα(p)=i=1∑K1−αpiα−18.
From the standpoint of information geometry, the Bregman divergences induced by Tsallis α-negentropy interpolate between families of statistical manifolds, specifically traversing Burg, Shannon, and quadratic regimes as α passes specified thresholds. The constants Hα(p)=i=1∑K1−αpiα−19 capture the detailed fα(p)=−Hα(p)0-curvature properties of these geometries, including distinct phase transitions at fα(p)=−Hα(p)1.
6. Classical Limits and Special Cases
As α approaches canonical values, Tsallis α-negentropy and its induced Bregman divergences recover several classical divergences and inequalities:
fα(p)=−Hα(p)2: Standard KL divergence, Pinsker's classical inequality (fα(p)=−Hα(p)3).
fα(p)=−Hα(p)4: Burg entropy and Itakura–Saito divergence (fα(p)=−Hα(p)5), with a dimension-free constant.
fα(p)=−Hα(p)6: Quadratic case (fα(p)=−Hα(p)7), with sharp fα(p)=−Hα(p)8-to-fα(p)=−Hα(p)9 inequalities reflecting parity corrections in odd K.
7. Summary and Significance
The systematic extension of Pinsker’s inequality to the Tsallis/Bregman class, with exact, closed-form phase transitions in the sharp constant, clarifies the precise geometric and statistical roles of Tsallis α-negentropy in probabilistic prediction, online learning, and information geometry. Notably, the breakdown of uniform Pinsker-type lower bounds for α→10 and multiclass settings delineates intrinsic limitations of the family. The results collectively provide a unified quantitative connection between Bregman divergence control and total variation error, applicable across inference, optimization, and statistical manifold analysis (Beretta et al., 5 Feb 2026).