On the Efficiency of Sinkhorn-Knopp for Entropically Regularized Optimal Transport

Published 4 Apr 2026 in cs.DS and cs.LG | (2604.03787v1)

Abstract: The Sinkhorn--Knopp (SK) algorithm is a cornerstone method for matrix scaling and entropically regularized optimal transport (EOT). Despite its empirical efficiency, existing theoretical guarantees to achieve a target marginal accuracy $\varepsilon$ deteriorate severely in the presence of outliers, bottlenecked either by the global maximum regularized cost $η|C|\infty$ (where $η$ is the regularization parameter and $C$ the cost matrix) or the matrix's minimum-to-maximum entry ratio $ν$. This creates a fundamental disconnect between theory and practice. In this paper, we resolve this discrepancy. For EOT, we introduce the novel concept of well-boundedness, a local bulk mass property that rigorously isolates the well-behaved portion of the data from extreme outliers. We prove that governed by this fundamental notion, SK recovers the target transport plan for a problem of dimension $n$ in $O(\log n - \log \varepsilon)$ iterations, completely independent of the regularized cost $η|C|\infty$. Furthermore, we show that a virtually cost-free pre-scaling step eliminates the dimensional dependence entirely, accelerating convergence to a strictly dimension-free $O(\log(1/\varepsilon))$ iterations. Beyond EOT, we establish a sharp phase transition for general $(\boldsymbol{u},\boldsymbol{v})$-scaling governed by a critical matrix density threshold. We prove that when a matrix's density exceeds this threshold, the iteration complexity is strictly independent of $ν$. Conversely, when the density falls below this threshold, the dependence on $ν$ becomes unavoidable; in this sub-critical regime, we construct instances where SK requires $Ω(n/\varepsilon)$ iterations.

Abstract PDF Upgrade to Chat

Authors (1)

Kun He

Summary

The paper introduces well-boundedness, showing that SK's iteration count depends on bulk mass properties rather than extreme cost values.
It demonstrates that a simple pre-scaling step yields dimension-free complexity, providing optimal logarithmic convergence bounds.
A sharp phase transition is identified, clarifying regimes where convergence becomes independent of cost spread and nontrivial entry density.

Analysis of Sinkhorn-Knopp Efficiency for Entropic Optimal Transport

Introduction and Problem Statement

The Sinkhorn-Knopp (SK) algorithm is central to matrix scaling and entropically regularized optimal transport (EOT). In EOT, one seeks a transportation plan optimizing a cost regularized by entropy, which is efficiently solvable via iterative matrix scaling on Gibbs kernels. However, established finite-accuracy ( $\varepsilon$ -approximate) complexity bounds for SK deteriorate severely as the cost matrix develops outlier entries or as the entropic regularization $\eta$ is increased—a behavior not reflected in practice, where SK typically exhibits rapid convergence even on highly unbalanced or outlier-prone cost structures. Existing theory either relies on contraction in projective metrics, which decays exponentially with $\eta\|C\|_\infty$ , or on KL-type potential arguments, which are likewise linear in $\eta\|C\|_\infty$ . This disconnect forms a central theoretical deficiency.

Main Contributions

The paper resolves this discrepancy through the introduction of the $(\rho,\kappa)$ -well-boundedness condition, a local “bulk” mass property on the cost matrix $C$ (specifically, on the regularized cost $\eta C$ ). It is shown that SK iteration complexity depends only on this robust property—rather than on fragile global quantities like $\|C\|_\infty$ or the minimal-to-maximal entry ratio $\nu$ —and, with a simple pre-scaling step, can even become strictly dimension-independent.

The key results are:

EOT iteration complexity is $O(\log n - \log \varepsilon)$ under well-boundedness, independent of $\eta$ 0.
A cost-free pre-scaling yields dimension-free $\eta$ 1 bounds, optimal up to constant factors.
A sharp phase transition in the dependence on structural parameters ( $\eta$ 2 and density), precisely characterizing regimes of dimension and norm independence for general $\eta$ 3-scaling, and complementary lower bounds.

Technical Foundations

Well-Boundedness and Bulk Mass

The paper defines $\eta$ 4-well-boundedness: for positive marginals $\eta$ 5 (normalized), the cost matrix is well-bounded if the minimal weighted sum, over any row, of entries $\eta$ 6, plus the analogous sum over any column, strictly exceeds $\eta$ 7 by $\eta$ 8. This decouples SK’s convergence from maximal outlier costs, as only the bulk of the kernel’s mass must be carried by subexponential entries in $\eta$ 9.

Dimension-Dependent and Pre-Scaled Complexity

The principal theorem establishes that, under well-boundedness, to achieve marginal projection accuracy $\eta\|C\|_\infty$ 0, SK requires only

$\eta\|C\|_\infty$ 1

iterations, with $\eta\|C\|_\infty$ 2 dimension- and cost-independent constants.

A simple initial diagonal rescaling—inputting $\eta\|C\|_\infty$ 3—removes dimensional dependence, yielding iteration complexity

$\eta\|C\|_\infty$ 4

per fixed $\eta\|C\|_\infty$ 5.

Phase Transition: Density and Entry Ratio Regimes

The analysis identifies a critical threshold for matrix “density”: If there exists a constant lower bound on the fraction $\eta\|C\|_\infty$ 6 ( $\eta\|C\|_\infty$ 7, resp.) of nontrivial entries per row (column, resp.), with $\eta\|C\|_\infty$ 8, then the iteration complexity is independent of $\eta\|C\|_\infty$ 9. If $\eta\|C\|_\infty$ 0, the dependence $\eta\|C\|_\infty$ 1 is unavoidable; explicit counterexamples exhibit SK requiring $\eta\|C\|_\infty$ 2 iterations, with $\eta\|C\|_\infty$ 3 exponentially small.

The study reveals further that, at the threshold $\eta\|C\|_\infty$ 4, for certain marginal choices, SK can still avoid dependence on $\eta\|C\|_\infty$ 5—unlike the uniform case.

Combinatorial Analysis and Reduction Techniques

Two central technical components enable these results:

Reduction Framework: Arbitrary $\eta\|C\|_\infty$ 6-scaling problems are discretized to uniform scaling on square matrices, “revealing” combinatorial invariants (e.g., permanent lower bounds) inaccessible on rectangular or weighted-marginal instances.
Structural Stability: The paper proves strengthened stability for matrix entries and permanents along the scaling orbit, robust as the plan approaches double stochasticity and essential for sharp iteration bounds.

Comparison to Prior Work

The provided bounds completely decouple SK’s iteration count from $\eta\|C\|_\infty$ 7, subsuming prior results which either require a global cost cap ( $\eta\|C\|_\infty$ 8, yielding logarithmic rates) or become vacuous as this quantity diverges. Polynomial or exponential iteration dependence on either the cost spread or the accuracy parameter $\eta\|C\|_\infty$ 9 are shown to be non-intrinsic under bulk mass regularity, clarifying why SK can scale nonnegative kernels arising in “practical” OT problems efficiently.

In matrix scaling theory, while strongly polynomial-time algorithms, interior-point, and fast first-order methods exist, SK’s parallel, structure-exploiting updates and favorable dependence on “real” instance statistics explain its empirical preference.

Implications and Future Directions

Theoretical Impact

This work conclusively explains the practical efficiency of SK for EOT and general scaling, reconciling theory with observed phenomena. It provides tight, non-asymptotic rates grounded in structural rather than worst-case parameters. The phase transition analysis and the identification of tractable marginals even at the “critical” threshold enrich the understanding of both algorithmic limits and the geometry of entropically regularized transport.

Practical and Algorithmic Ramifications

Pre-scaling should be universally adopted in EOT applications for high-dimensional problems. Practitioners can disregard global outliers, focusing on bulk-normalization to certify near-optimal SK performance. The convergence theory gives firm guidance for parameter selection and expected iteration counts even for cost matrices riddled with prohibitive or corrupted entries.

Open Directions

Extending bulk-mass convergence results and discretization tools to non-EOT scaling objective classes, such as regularized flow, min-cost matching, and spectral normalization.
Exploiting structural stability theory and discretization techniques to enhance randomized or deterministic matrix algorithms in data science and combinatorial optimization.
Investigating the effect of mini-batch, stochastic, or asynchronous variants of SK in the entropic regime, especially in distributed computing or GPU architectures.

Conclusion

This work establishes that the empirical rapid convergence of Sinkhorn-Knopp for entropic OT arises not from benign data, but from deep structural resilience to outliers and bottlenecks—robustness now quantified and certified. By isolating minimal, natural bulk conditions, and demonstrating optimal pre-scaling, the study closes the gap between worst-case theory and practice, while drawing new boundaries for efficient algorithm design in high-dimensional discrete transport.

Markdown Report Issue