Tsallis α-negentropy: Theory & Applications

Updated 6 May 2026

Tsallis α-negentropy is defined as the negative of Tsallis α-entropy and generalizes classical entropic forms, forming the basis for key Bregman divergences.
Its strict convexity and explicit differential structure enable precise error bounds via generalized Pinsker inequalities with phase transitions based on α and dimensional effects.
Applications span robust statistics, online learning, and information geometry, where it links surrogate risk measures to total variation distance in probabilistic prediction.

Tsallis α-negentropy refers to the negative of the Tsallis α-entropy, a one-parameter family of information measures that generalize the Shannon–Gibbs entropy by interpolating between several classical entropic forms as α varies. On the open probability simplex, negative Tsallis entropies are canonically used to generate a family of strictly convex functions, whose Bregman divergences—denoted $D_\alpha$ —serve as foundational objects in modern information-theoretic learning, online algorithms, and information geometry. These divergences include, as key special instances, the Kullback–Leibler divergence, reverse KL, Itakura–Saito divergence, and the so-called β-divergences prevalent in robust statistics and signal processing. A central research thread is the precise relationship between the α-negentropy-induced divergence and total variation distance, as formalized in sharp, dimension-aware generalized Pinsker inequalities (Beretta et al., 5 Feb 2026).

1. Definition of Tsallis α-entropies and α-negentropy

Let $\Delta^K = \{ p \in [0, 1]^K: \sum_{i=1}^K p_i = 1 \}$ denote the probability simplex with relative interior $\mathrm{relint}\, \Delta^K = \{ p \in \Delta^K : p_i > 0\, \forall i \}$ .

The Tsallis α-entropy is given by: $S_\alpha(p) = \begin{cases} \frac{1}{\alpha(1-\alpha)} \sum_{i=1}^K p_i^\alpha , & \alpha \notin \{0,1\} \ \sum_{i=1}^K \ln p_i, & \alpha = 0 \ -\sum_{i=1}^K p_i \ln p_i , & \alpha = 1 \end{cases}$

Negative Tsallis α-entropy (α-negentropy), defined as $f_\alpha(p) = -S_\alpha(p)$ , is adopted as the convex generator in Bregman divergence constructions. Alternative normalizations often appear, such as: $H_\alpha(p) = \sum_{i=1}^K \frac{p_i^\alpha - 1}{1-\alpha}$ satisfying $f_\alpha(p) = -H_\alpha(p)$ up to scale and additive constants, which do not affect the induced Bregman divergence.

Key classical limits:

$\alpha \to 1$ : recovers Shannon entropy.
$\alpha = 0$ : coincides with Burg (log) entropy.

2. Convexity and Differential Structure

The α-negentropy $f_\alpha$ is infinitely differentiable and strictly convex on $\Delta^K = \{ p \in [0, 1]^K: \sum_{i=1}^K p_i = 1 \}$ 0 for all real α. The Hessian structure is explicit: $\Delta^K = \{ p \in [0, 1]^K: \sum_{i=1}^K p_i = 1 \}$ 1 This positive definite form guarantees strict convexity and underlies the strong-convexity properties used in subsequent Pinsker-type lower bounds.

3. Bregman Divergences from α-negentropy

For $\Delta^K = \{ p \in [0, 1]^K: \sum_{i=1}^K p_i = 1 \}$ 2,

$\Delta^K = \{ p \in [0, 1]^K: \sum_{i=1}^K p_i = 1 \}$ 3

is the associated Bregman divergence. The properties of $\Delta^K = \{ p \in [0, 1]^K: \sum_{i=1}^K p_i = 1 \}$ 4 imply that $\Delta^K = \{ p \in [0, 1]^K: \sum_{i=1}^K p_i = 1 \}$ 5, with equality if and only if $\Delta^K = \{ p \in [0, 1]^K: \sum_{i=1}^K p_i = 1 \}$ 6.

Special cases include:

$\Delta^K = \{ p \in [0, 1]^K: \sum_{i=1}^K p_i = 1 \}$ 7: Kullback–Leibler divergence $\Delta^K = \{ p \in [0, 1]^K: \sum_{i=1}^K p_i = 1 \}$ 8.
$\Delta^K = \{ p \in [0, 1]^K: \sum_{i=1}^K p_i = 1 \}$ 9: Reverse KL / Itakura–Saito divergence.
$\mathrm{relint}\, \Delta^K = \{ p \in \Delta^K : p_i > 0\, \forall i \}$ 0: Euclidean quadratic divergence.

$\mathrm{relint}\, \Delta^K = \{ p \in \Delta^K : p_i > 0\, \forall i \}$ 1 also coincides with the β-divergence $\mathrm{relint}\, \Delta^K = \{ p \in \Delta^K : p_i > 0\, \forall i \}$ 2 (with $\mathrm{relint}\, \Delta^K = \{ p \in \Delta^K : p_i > 0\, \forall i \}$ 3) utilized in robust statistics and nonnegative matrix factorization.

4. Generalized Pinsker Inequality and Sharp Constants

A central result is the extension of Pinsker's inequality to the full Tsallis/Bregman family. For all $\mathrm{relint}\, \Delta^K = \{ p \in \Delta^K : p_i > 0\, \forall i \}$ 4,

$\mathrm{relint}\, \Delta^K = \{ p \in \Delta^K : p_i > 0\, \forall i \}$ 5

where the sharp optimal constant $\mathrm{relint}\, \Delta^K = \{ p \in \Delta^K : p_i > 0\, \forall i \}$ 6 is given explicitly in a piecewise manner, exhibiting several phase transitions as α and K vary (Beretta et al., 5 Feb 2026).

Summary table of $\mathrm{relint}\, \Delta^K = \{ p \in \Delta^K : p_i > 0\, \forall i \}$ 7:

Case	$\mathrm{relint}\, \Delta^K = \{ p \in \Delta^K : p_i > 0\, \forall i \}$ 8	Regime/Remarks
$\mathrm{relint}\, \Delta^K = \{ p \in \Delta^K : p_i > 0\, \forall i \}$ 9	$S_\alpha(p) = \begin{cases} \frac{1}{\alpha(1-\alpha)} \sum_{i=1}^K p_i^\alpha , & \alpha \notin \{0,1\} \ \sum_{i=1}^K \ln p_i, & \alpha = 0 \ -\sum_{i=1}^K p_i \ln p_i , & \alpha = 1 \end{cases}$ 0	Dimension-free
$S_\alpha(p) = \begin{cases} \frac{1}{\alpha(1-\alpha)} \sum_{i=1}^K p_i^\alpha , & \alpha \notin \{0,1\} \ \sum_{i=1}^K \ln p_i, & \alpha = 0 \ -\sum_{i=1}^K p_i \ln p_i , & \alpha = 1 \end{cases}$ 1	$S_\alpha(p) = \begin{cases} \frac{1}{\alpha(1-\alpha)} \sum_{i=1}^K p_i^\alpha , & \alpha \notin \{0,1\} \ \sum_{i=1}^K \ln p_i, & \alpha = 0 \ -\sum_{i=1}^K p_i \ln p_i , & \alpha = 1 \end{cases}$ 2 (even K), $S_\alpha(p) = \begin{cases} \frac{1}{\alpha(1-\alpha)} \sum_{i=1}^K p_i^\alpha , & \alpha \notin \{0,1\} \ \sum_{i=1}^K \ln p_i, & \alpha = 0 \ -\sum_{i=1}^K p_i \ln p_i , & \alpha = 1 \end{cases}$ 3 (odd K)	Power-law penalty, parity effect
$S_\alpha(p) = \begin{cases} \frac{1}{\alpha(1-\alpha)} \sum_{i=1}^K p_i^\alpha , & \alpha \notin \{0,1\} \ \sum_{i=1}^K \ln p_i, & \alpha = 0 \ -\sum_{i=1}^K p_i \ln p_i , & \alpha = 1 \end{cases}$ 4	$S_\alpha(p) = \begin{cases} \frac{1}{\alpha(1-\alpha)} \sum_{i=1}^K p_i^\alpha , & \alpha \notin \{0,1\} \ \sum_{i=1}^K \ln p_i, & \alpha = 0 \ -\sum_{i=1}^K p_i \ln p_i , & \alpha = 1 \end{cases}$ 5	Decay with α
$S_\alpha(p) = \begin{cases} \frac{1}{\alpha(1-\alpha)} \sum_{i=1}^K p_i^\alpha , & \alpha \notin \{0,1\} \ \sum_{i=1}^K \ln p_i, & \alpha = 0 \ -\sum_{i=1}^K p_i \ln p_i , & \alpha = 1 \end{cases}$ 6	$S_\alpha(p) = \begin{cases} \frac{1}{\alpha(1-\alpha)} \sum_{i=1}^K p_i^\alpha , & \alpha \notin \{0,1\} \ \sum_{i=1}^K \ln p_i, & \alpha = 0 \ -\sum_{i=1}^K p_i \ln p_i , & \alpha = 1 \end{cases}$ 7	No uniform lower bound

Here, $S_\alpha(p) = \begin{cases} \frac{1}{\alpha(1-\alpha)} \sum_{i=1}^K p_i^\alpha , & \alpha \notin \{0,1\} \ \sum_{i=1}^K \ln p_i, & \alpha = 0 \ -\sum_{i=1}^K p_i \ln p_i , & \alpha = 1 \end{cases}$ 8 is a parity correction for odd K, with $S_\alpha(p) = \begin{cases} \frac{1}{\alpha(1-\alpha)} \sum_{i=1}^K p_i^\alpha , & \alpha \notin \{0,1\} \ \sum_{i=1}^K \ln p_i, & \alpha = 0 \ -\sum_{i=1}^K p_i \ln p_i , & \alpha = 1 \end{cases}$ 9 for $f_\alpha(p) = -S_\alpha(p)$ 0.

Key phase transitions:

For $f_\alpha(p) = -S_\alpha(p)$ 1, the lower bound is dimension-independent.
For $f_\alpha(p) = -S_\alpha(p)$ 2, dimension and parity effects emerge.
For $f_\alpha(p) = -S_\alpha(p)$ 3, only the binary (K=2) case admits a nontrivial bound; in multiclass cases (K≥3), no uniform Pinsker-type lower bound holds.

Classical recoveries:

$f_\alpha(p) = -S_\alpha(p)$ 4: Recovers the Kullback–Leibler Pinsker bound ( $f_\alpha(p) = -S_\alpha(p)$ 5).
$f_\alpha(p) = -S_\alpha(p)$ 6: Burg entropy and its divergence, constant $f_\alpha(p) = -S_\alpha(p)$ 7.
$f_\alpha(p) = -S_\alpha(p)$ 8: Euclidean case, $f_\alpha(p) = -S_\alpha(p)$ 9 (even K), $H_\alpha(p) = \sum_{i=1}^K \frac{p_i^\alpha - 1}{1-\alpha}$ 0 (odd K).

5. Applications in Learning, Optimization, and Geometry

The excess risk of Tsallis (power) scoring rules coincides with $H_\alpha(p) = \sum_{i=1}^K \frac{p_i^\alpha - 1}{1-\alpha}$ 1: $H_\alpha(p) = \sum_{i=1}^K \frac{p_i^\alpha - 1}{1-\alpha}$ 2 so that

$H_\alpha(p) = \sum_{i=1}^K \frac{p_i^\alpha - 1}{1-\alpha}$ 3

The generalized Pinsker inequality thus tightly relates surrogate excess risk to total variation distance, providing explicit control in settings of probabilistic prediction and classification.

In online learning and mirror descent, $H_\alpha(p) = \sum_{i=1}^K \frac{p_i^\alpha - 1}{1-\alpha}$ 4 functions as a Tsallis regularizer. The sharp constant $H_\alpha(p) = \sum_{i=1}^K \frac{p_i^\alpha - 1}{1-\alpha}$ 5 quantifies the $H_\alpha(p) = \sum_{i=1}^K \frac{p_i^\alpha - 1}{1-\alpha}$ 6-strong convexity essential for regret bounds and step-size tuning. Uniform curvature is observed for $H_\alpha(p) = \sum_{i=1}^K \frac{p_i^\alpha - 1}{1-\alpha}$ 7; dimension-dependent geometry appears when $H_\alpha(p) = \sum_{i=1}^K \frac{p_i^\alpha - 1}{1-\alpha}$ 8.

From the standpoint of information geometry, the Bregman divergences induced by Tsallis α-negentropy interpolate between families of statistical manifolds, specifically traversing Burg, Shannon, and quadratic regimes as α passes specified thresholds. The constants $H_\alpha(p) = \sum_{i=1}^K \frac{p_i^\alpha - 1}{1-\alpha}$ 9 capture the detailed $f_\alpha(p) = -H_\alpha(p)$ 0-curvature properties of these geometries, including distinct phase transitions at $f_\alpha(p) = -H_\alpha(p)$ 1.

6. Classical Limits and Special Cases

As α approaches canonical values, Tsallis α-negentropy and its induced Bregman divergences recover several classical divergences and inequalities:

$f_\alpha(p) = -H_\alpha(p)$ 2: Standard KL divergence, Pinsker's classical inequality ( $f_\alpha(p) = -H_\alpha(p)$ 3).
$f_\alpha(p) = -H_\alpha(p)$ 4: Burg entropy and Itakura–Saito divergence ( $f_\alpha(p) = -H_\alpha(p)$ 5), with a dimension-free constant.
$f_\alpha(p) = -H_\alpha(p)$ 6: Quadratic case ( $f_\alpha(p) = -H_\alpha(p)$ 7), with sharp $f_\alpha(p) = -H_\alpha(p)$ 8-to- $f_\alpha(p) = -H_\alpha(p)$ 9 inequalities reflecting parity corrections in odd K.

7. Summary and Significance

The systematic extension of Pinsker’s inequality to the Tsallis/Bregman class, with exact, closed-form phase transitions in the sharp constant, clarifies the precise geometric and statistical roles of Tsallis α-negentropy in probabilistic prediction, online learning, and information geometry. Notably, the breakdown of uniform Pinsker-type lower bounds for $\alpha \to 1$ 0 and multiclass settings delineates intrinsic limitations of the family. The results collectively provide a unified quantitative connection between Bregman divergence control and total variation error, applicable across inference, optimization, and statistical manifold analysis (Beretta et al., 5 Feb 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Generalized Pinsker Inequality for Bregman Divergences of Negative Tsallis Entropies (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Tsallis α-negentropy.