Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 70 tok/s

Gemini 2.5 Pro 42 tok/s Pro

GPT-5 Medium 30 tok/s Pro

GPT-5 High 34 tok/s Pro

GPT-4o 111 tok/s Pro

Kimi K2 202 tok/s Pro

GPT OSS 120B 452 tok/s Pro

Claude Sonnet 4.5 34 tok/s Pro

2000 character limit reached

Entropy Regularized Optimal Transport

Updated 8 October 2025

Entropy-regularized optimal transport is a method that augments classical OT with an entropy penalty, resulting in a strictly convex and smooth optimization problem solved via Sinkhorn iterations.
It provides strong statistical foundations through central limit theorems and bootstrap methods, enabling accurate hypothesis testing and inference in high-dimensional data settings.
The approach is practically applied in areas like image analysis for comparing color histograms, where careful tuning of the regularization parameter balances computational efficiency and statistical power.

Entropy-regularized optimal transport (also known as the Sinkhorn divergence or smoothed Wasserstein distance) is a framework that augments the classical optimal transport problem with an entropy penalty, conferring strong convexity, smoothing, and computational efficiency—particularly in high-dimensional and data-analytical settings. The entropy regularization replaces the intractable linear program underlying optimal transport with a strictly convex and smooth optimization problem, solvable by the Sinkhorn–Knopp algorithm. Asymptotic, statistical, and algorithmic properties of entropy-regularized optimal transport are central to both theoretical investigations and practical applications in machine learning and statistics.

1. Entropy-Regularized Formulation and Dual Structure

In the finite setting, for two discrete probability vectors $a, b \in \Delta_N$ (the $N$ -simplex), and a cost matrix $C \in \mathbb{R}^{N \times N}$ with entries $c_{ij}$ , the entropy-regularized optimal transport cost is

$W_{p,\varepsilon}^p(a, b) = \min_{T \in U(a, b)} \left\{ \langle T, C \rangle + \varepsilon H(T \Vert a \otimes b) \right\}$

where $U(a, b)$ is the set of coupling matrices with marginals $a$ , $b$ , and $H(T \Vert a \otimes b) = \sum_{i, j} T_{i,j} \log \frac{T_{i,j}}{a_i b_j}$ is the Kullback–Leibler divergence.

The dual takes the form: $W_{p, \varepsilon}^p(a, b) = \max_{u, v \in \mathbb{R}^N} \left\{ u^T a + v^T b - \varepsilon \sum_{i, j} \exp \left( -\frac{c_{ij} - u_i - v_j}{\varepsilon} \right) a_i b_j \right\}$

This dual problem is efficiently computable via Sinkhorn iterations: iterative row and column scalings of an exponentiated cost kernel matrix, yielding a unique strictly positive transport plan $T^*$ .

A centered version, the Sinkhorn loss, is also considered: $W_{p, \varepsilon}^p(a, a) = 0$ by construction, making it suited for hypothesis testing and as a proper metric.

2. Central Limit Theorems: Empirical and Centered Versions

The statistical properties of empirical Sinkhorn divergences were analyzed for both one- and two-sample settings. For the empirical measure $\hat{a}_n$ of $n$ i.i.d. samples from $a$ , the limiting distribution is:

One-sample:

$\sqrt{n}\left[W_{p, \varepsilon}^p(\hat{a}_n, b) - W_{p, \varepsilon}^p(a, b)\right] \to \langle G, u_{(\varepsilon)} \rangle,$

where $G$ is Gaussian with covariance from the multinomial law of $a$ , and $u_{(\varepsilon)}$ is the dual solution.

Two-sample ( $\hat{a}_n$ , $\hat{b}_m$ ):

$\rho_{n, m}\left[ W_{p, \varepsilon}^p(\hat{a}_n, \hat{b}_m) - W_{p, \varepsilon}^p(a, b) \right] \to \sqrt{\gamma} \langle G, u_{(\varepsilon)} \rangle + \sqrt{1-\gamma} \langle H, v_{(\varepsilon)} \rangle,$

with $\rho_{n, m} = \sqrt{nm/(n+m)}$ , $\gamma$ the asymptotic proportion, and $H$ independent Gaussian.

For the centered Sinkhorn loss under the null ( $a = b$ ), the first-order delta-method fails. Instead, a second-order expansion shows that: $n W_{p, \varepsilon}^p(\hat{a}_n, a) \to \frac{1}{2} \sum_{i=1}^N \lambda_i \chi^2_i(1),$ with $\lambda_i$ eigenvalues of the Hessian (effective curvature) of the Sinkhorn loss and $\chi^2_i(1)$ independent chi-square variables.

This dichotomy between first-order (alternative) and second-order (null) limits is intrinsic to the statistical inference for optimal transport distances.

3. Bootstrap Inference and Test Statistics

Empirical central limit theorems (CLTs) provide asymptotic distributions for test statistics, but finite samples require resampling-based inference. The paper develops bootstrap methods:

For one-sample: resample $a^*_n$ and compute $\sqrt{n}(W_{p, \varepsilon}^p(a^*_n, b) - W_{p, \varepsilon}^p(\hat{a}_n, b))$ .
For two-sample: resample both $a^*_n$ , $b^*_m$ .

These procedures approximate the finite-sample law of the respective statistics, proven consistent in the bounded Lipschitz metric. Under the null, where the first-order term is zero, a Babu–bootstrap (which accounts for higher-order behavior) is needed to capture the chi-square mixture limit.

4. Asymptotic Regime as $\varepsilon \to 0$

As the regularization vanishes, entropy-regularized distances approach the classical Wasserstein distance. Explicitly, the approximation error satisfies: $W_{p,\varepsilon}^p(\hat{a}_n,b) - W_p^p(\hat{a}_n,b) \leq 2 \varepsilon q \log \left( \frac{e^2 L \operatorname{diam}(X)}{\varepsilon \sqrt{q}} \right)$ which vanishes if $\sqrt{n} \varepsilon \log (1/\varepsilon) \to 0$ . The CLTs of the regularized distances thus converge to the known unregularized OT limiting behavior as the smoothing effect of $\varepsilon$ disappears.

5. Applications: Hypothesis Testing and Empirical Illustration

Extensive experiments demonstrate the utility of entropy-regularized OT for high-dimensional, discrete distributions:

Simulated data on $5 \times 5$ and $20 \times 20$ grids confirm convergence in distribution (Gaussian or chi-square mixture) as predicted.
Bootstrap approximations for finite-sample testing display strong agreement with theoretical asymptotics.
The regularization parameter $\varepsilon$ is shown to control the tradeoff between computational tractability and statistical power; its optimal selection is crucial in practice.

A concrete application considered is the comparison of color histograms (in $3$-dimensional RGB space) across image datasets. The centered Sinkhorn loss, used with bootstrapped quantiles, enables rejection of the null hypothesis of identical color distributions between different seasons (autumn vs. winter), illustrating the methodology's applicability to real-world, high-dimensional empirical data.

6. Mathematical Formalism and Key Formulas

Key mathematical components are:

Primal entropy-regularized OT:

$W_{p, \varepsilon}^p(a, b) = \min_{T \in U(a, b)} \{ \langle T, C \rangle + \varepsilon H(T \Vert a \otimes b)\}$

Dual:

$W_{p,\varepsilon}^p(a,b) = \max_{u, v \in \mathbb{R}^N} \left\{ u^T a + v^T b - \varepsilon \sum_{i, j} \exp \left( -\frac{c_{ij} - u_i - v_j}{\varepsilon} \right) a_i b_j \right\}$

One-sample CLT:

$\sqrt{n} \left[ W_{p, \varepsilon}^p(\hat{a}_n, b) - W_{p, \varepsilon}^p(a, b) \right] \to \langle G, u_{(\varepsilon)} \rangle$

Second-order (null case):

$n W_{p,\varepsilon}^p(\hat{a}_n, a) \to \frac{1}{2} \sum_i \lambda_i \chi_i^2(1)$

The Hessian matrix governing the second-order limit is constructed from the second derivative of the Sinkhorn loss and the square root of the multinomial covariance, encapsulating both curvature and statistical variability.

7. Relevance, Limitations, and Statistical Implications

The paper's results provide a rigorous statistical foundation for the use of entropy-regularized optimal transport and Sinkhorn divergence in data analysis. The asymptotic theory enables valid hypothesis testing and confidence assessment, which was previously unattainable for the classical OT due to computational intractability and lack of differentiability.

The practical impact is that entropy-regularized OT can be used for large-scale, high-dimensional problems (such as color histograms for image comparison) while retaining strong inferential guarantees.

A notable limitation is the necessity of tuning the regularization parameter $\varepsilon$ : if chosen too large, the bias may dominate; too small, and numerical and inferential instability may arise. The transition to the unregularized OT is well-characterized, providing practical guidance for parameter selection.

The adoption of entropy-regularized OT extends hypothesis testing capabilities in non-Euclidean, nonparametric settings, and the established methodologies—CRTs, robust bootstrap calibration, and direct connection to transport-based statistics—permit their systematic use in contemporary machine learning and statistics.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to Entropy Regularized Optimal Transport.