Tsallis Regularized Optimal Transport
- Tsallis regularized optimal transport is a variant that integrates a power-law Tsallis divergence into classical optimal transport to produce sparse and interpretable solutions.
- The methodology employs a dual formulation and generalized Sinkhorn algorithms, enabling robust iterative updates and effective handling of high-dimensional data.
- Theoretical results demonstrate parametric rate convergence independent of dimensionality, making it valuable for practical applications in statistical inference and machine learning.
Tsallis-regularized optimal transport is a class of optimal transport (OT) problems in which the standard cost-minimization formulation is supplemented by a regularization term based on the Tsallis divergence. This approach generalizes entropic (Kullback–Leibler, KL) regularization by introducing a family of power-law (Tsallis or ) divergences, interpolating between the classic (unregularized) optimal transport, entropic regularization, and quadratic or higher-degree penalizations. Tsallis regularization yields sparse optimal couplings and brings distinct statistical and computational properties, notably offering sample complexity that avoids the curse of dimensionality and enabling new algorithmic implementations. Theoretical investigations have addressed convergence rates, duality theory, central limit properties, and practical iterative schemes.
1. Mathematical Formulation and Duality
Let and be probability measures on a compact set , and the set of couplings with marginals . The Tsallis divergence of order (often denoted in the literature) between a coupling and is
0
Given a continuous cost 1, the Tsallis-regularized OT problem is
2
For discrete histograms 3 and cost matrix 4 with regularization parameter 5, the primal is
6
where 7 denotes Tsallis entropy: 8 The dual problem (continuous) is
9
where 0 is the convex conjugate of 1, namely 2 with 3, 4 (González-Sanz et al., 7 May 2025). The discrete dual and its Fenchel conjugate reduce to
5
where 6 has explicit piecewise form for 7 (Marino et al., 2020, Terjék et al., 2021).
2. Theoretical Properties: Sparsity, Convergence, and Dimensionality
Tsallis regularization with 8 yields couplings whose densities vanish below a threshold, achieving genuine sparsity in contrast to the full support of the entropic (KL, 9) regularization (González-Sanz et al., 7 May 2025, Muzellec et al., 2016, Marino et al., 2020). The primal is strictly convex for 0.
A central result is that, under mild smoothness and compactness assumptions, the empirical Tsallis-regularized OT between empirical measures 1 converges at the parametric rate 2—independently of the dimension 3 (González-Sanz et al., 7 May 2025). Specifically, central limit theorems hold for the OT cost, coupling, and dual potentials. This counters prior belief that the lack of strong dual concavity or smoothness in Tsallis regularization induces the curse of dimensionality.
In the framework of 4-convergence, minimizers of the Tsallis-regularized problem converge narrowly to those of the unregularized OT as 5 (Suguro et al., 2023).
3. Algorithmic Schemes and Generalized Sinkhorn Algorithms
Practical computation proceeds via iterative updates of dual potentials, generalizing the Sinkhorn–Knopp matrix scaling to the Tsallis setting. For 6, the dual variable update involves Newton's method or bisection to solve one-dimensional nonlinear equations: 7 and analogously for 8 (González-Sanz et al., 7 May 2025, Marino et al., 2020). In the discrete setting, the q-Sinkhorn algorithm alternates
9
where 0 and 1 are deformed q-exponential and q-logarithm functions (Marino et al., 2020, Muzellec et al., 2016).
For 2, sparsity arises because the derivative of the dual conjugate vanishes on 3, so coupling entries are zero wherever 4 (Terjék et al., 2021, González-Sanz et al., 7 May 2025). Algorithmic convergence is monotone and robust under mild conditions, with empirical performance comparable to classical Sinkhorn for a wide range of 5 (Muzellec et al., 2016, Marino et al., 2020).
4. Statistical and Asymptotic Results
Parametric rate convergence with dimension-independence is achieved through an analysis exploiting the geometry of Hölder spaces and a tailored Z-estimation framework for non-Donsker classes (González-Sanz et al., 7 May 2025). Central limit theorems are established for the regularized cost, optimal coupling, and dual potentials, characterizing their joint fluctuations around the population values in terms of explicit Gaussian processes and invertible linearizations.
Convergence rates as 6 have sharp upper bounds: for measures admitting suitable quantization rates, the regularization bias decays as
7
with 8 the quantization exponent, showing that the KL case (9) is extremal and fastest (Suguro et al., 2023). The strong convexity required for strict statistical error bounds holds efficiently for 0 (González-Sanz et al., 7 May 2025).
5. Applications and Interpretations
Tsallis regularization underpins practical schemes in high-dimensional statistics, machine learning, and distributional inference (González-Sanz et al., 7 May 2025, Muzellec et al., 2016). In ecological inference, it enables accurate reconstruction of joint tables (e.g., ethnicity-vote distributions from marginals and side information), outperforming entropic OT in both accuracy and the alignment of solutions with side constraints when parameters 1 are optimally tuned (Muzellec et al., 2016).
For Gaussian and 2-normal distributions, Tsallis-regularized OT yields explicit closed-form solutions, with regularization tuning the coupling between marginals from tightly coupled (Gaussian map, small 3) to nearly independent (product measures, large 4 or 5) (Tong et al., 2020).
From an optimization perspective, Tsallis regularization interpolates between the sparse, but less statistically efficient unregularized OT (6) and the smooth, fully-supported, but overspread entropic OT (7), with each value of 8 balancing bias, sparsity, and statistical efficiency (González-Sanz et al., 7 May 2025, Muzellec et al., 2016).
6. Comparison with Entropic and Other 9-divergence Regularizations
The Tsallis-regularized formulation is part of a larger class of 0-divergence regularized OT problems, which includes the Kullback–Leibler and 1 (quadratic) divergences as special cases (Terjék et al., 2021, Marino et al., 2020). Distinct from KL, Tsallis regularization produces sparse couplings and provable convergence without the need for strong dual concavity. The semigroup of divergences parametrized by 2 (or 3) governs both the degree of sparsity and the algorithmic complexity of the resulting problems. KL regularization is optimal in terms of convergence as 4, but may be suboptimal where sparsity or alignment with interpretable solutions is paramount (Suguro et al., 2023, González-Sanz et al., 7 May 2025).
7. Implementation Notes and Empirical Observations
Tsallis-regularized OT problems can be solved using deformed exponential/generalized Sinkhorn routines, mirror descent-based projection methods, or Newton/bisection schemes for dual updates (Muzellec et al., 2016, González-Sanz et al., 7 May 2025, Marino et al., 2020). For 5 (Pearson’s 6 divergence), row-/column-wise Newton methods are empirically stable and yield very sparse couplings. The choice of 7 significantly affects reconstruction accuracy in inference tasks, and in tested real-world settings, optimizing 8 away from 1 produces substantial improvements over both unregularized and entropic OT (Muzellec et al., 2016).
Numerical experiments consistently show that as 9 increases above 1, the induced optimal coupling becomes sparser, while the number of iterations required for algorithmic convergence remains in line with (or only mildly exceeds) that of the classical Sinkhorn algorithm (Marino et al., 2020, Muzellec et al., 2016). This confirms that Tsallis regularization is competitive for high-dimensional, large-scale, or sparsity-sensitive transport tasks.