- The paper introduces well-boundedness, showing that SK's iteration count depends on bulk mass properties rather than extreme cost values.
- It demonstrates that a simple pre-scaling step yields dimension-free complexity, providing optimal logarithmic convergence bounds.
- A sharp phase transition is identified, clarifying regimes where convergence becomes independent of cost spread and nontrivial entry density.
Analysis of Sinkhorn-Knopp Efficiency for Entropic Optimal Transport
Introduction and Problem Statement
The Sinkhorn-Knopp (SK) algorithm is central to matrix scaling and entropically regularized optimal transport (EOT). In EOT, one seeks a transportation plan optimizing a cost regularized by entropy, which is efficiently solvable via iterative matrix scaling on Gibbs kernels. However, established finite-accuracy (ε-approximate) complexity bounds for SK deteriorate severely as the cost matrix develops outlier entries or as the entropic regularization η is increased—a behavior not reflected in practice, where SK typically exhibits rapid convergence even on highly unbalanced or outlier-prone cost structures. Existing theory either relies on contraction in projective metrics, which decays exponentially with η∥C∥∞, or on KL-type potential arguments, which are likewise linear in η∥C∥∞. This disconnect forms a central theoretical deficiency.
Main Contributions
The paper resolves this discrepancy through the introduction of the (ρ,κ)-well-boundedness condition, a local “bulk” mass property on the cost matrix C (specifically, on the regularized cost ηC). It is shown that SK iteration complexity depends only on this robust property—rather than on fragile global quantities like ∥C∥∞ or the minimal-to-maximal entry ratio ν—and, with a simple pre-scaling step, can even become strictly dimension-independent.
The key results are:
- EOT iteration complexity is O(logn−logε) under well-boundedness, independent of η0.
- A cost-free pre-scaling yields dimension-free η1 bounds, optimal up to constant factors.
- A sharp phase transition in the dependence on structural parameters (η2 and density), precisely characterizing regimes of dimension and norm independence for general η3-scaling, and complementary lower bounds.
Technical Foundations
Well-Boundedness and Bulk Mass
The paper defines η4-well-boundedness: for positive marginals η5 (normalized), the cost matrix is well-bounded if the minimal weighted sum, over any row, of entries η6, plus the analogous sum over any column, strictly exceeds η7 by η8. This decouples SK’s convergence from maximal outlier costs, as only the bulk of the kernel’s mass must be carried by subexponential entries in η9.
Dimension-Dependent and Pre-Scaled Complexity
The principal theorem establishes that, under well-boundedness, to achieve marginal projection accuracy η∥C∥∞0, SK requires only
η∥C∥∞1
iterations, with η∥C∥∞2 dimension- and cost-independent constants.
A simple initial diagonal rescaling—inputting η∥C∥∞3—removes dimensional dependence, yielding iteration complexity
η∥C∥∞4
per fixed η∥C∥∞5.
Phase Transition: Density and Entry Ratio Regimes
The analysis identifies a critical threshold for matrix “density”: If there exists a constant lower bound on the fraction η∥C∥∞6 (η∥C∥∞7, resp.) of nontrivial entries per row (column, resp.), with η∥C∥∞8, then the iteration complexity is independent of η∥C∥∞9. If η∥C∥∞0, the dependence η∥C∥∞1 is unavoidable; explicit counterexamples exhibit SK requiring η∥C∥∞2 iterations, with η∥C∥∞3 exponentially small.
The study reveals further that, at the threshold η∥C∥∞4, for certain marginal choices, SK can still avoid dependence on η∥C∥∞5—unlike the uniform case.
Combinatorial Analysis and Reduction Techniques
Two central technical components enable these results:
- Reduction Framework: Arbitrary η∥C∥∞6-scaling problems are discretized to uniform scaling on square matrices, “revealing” combinatorial invariants (e.g., permanent lower bounds) inaccessible on rectangular or weighted-marginal instances.
- Structural Stability: The paper proves strengthened stability for matrix entries and permanents along the scaling orbit, robust as the plan approaches double stochasticity and essential for sharp iteration bounds.
Comparison to Prior Work
The provided bounds completely decouple SK’s iteration count from η∥C∥∞7, subsuming prior results which either require a global cost cap (η∥C∥∞8, yielding logarithmic rates) or become vacuous as this quantity diverges. Polynomial or exponential iteration dependence on either the cost spread or the accuracy parameter η∥C∥∞9 are shown to be non-intrinsic under bulk mass regularity, clarifying why SK can scale nonnegative kernels arising in “practical” OT problems efficiently.
In matrix scaling theory, while strongly polynomial-time algorithms, interior-point, and fast first-order methods exist, SK’s parallel, structure-exploiting updates and favorable dependence on “real” instance statistics explain its empirical preference.
Implications and Future Directions
Theoretical Impact
This work conclusively explains the practical efficiency of SK for EOT and general scaling, reconciling theory with observed phenomena. It provides tight, non-asymptotic rates grounded in structural rather than worst-case parameters. The phase transition analysis and the identification of tractable marginals even at the “critical” threshold enrich the understanding of both algorithmic limits and the geometry of entropically regularized transport.
Practical and Algorithmic Ramifications
Pre-scaling should be universally adopted in EOT applications for high-dimensional problems. Practitioners can disregard global outliers, focusing on bulk-normalization to certify near-optimal SK performance. The convergence theory gives firm guidance for parameter selection and expected iteration counts even for cost matrices riddled with prohibitive or corrupted entries.
Open Directions
- Extending bulk-mass convergence results and discretization tools to non-EOT scaling objective classes, such as regularized flow, min-cost matching, and spectral normalization.
- Exploiting structural stability theory and discretization techniques to enhance randomized or deterministic matrix algorithms in data science and combinatorial optimization.
- Investigating the effect of mini-batch, stochastic, or asynchronous variants of SK in the entropic regime, especially in distributed computing or GPU architectures.
Conclusion
This work establishes that the empirical rapid convergence of Sinkhorn-Knopp for entropic OT arises not from benign data, but from deep structural resilience to outliers and bottlenecks—robustness now quantified and certified. By isolating minimal, natural bulk conditions, and demonstrating optimal pre-scaling, the study closes the gap between worst-case theory and practice, while drawing new boundaries for efficient algorithm design in high-dimensional discrete transport.