Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
51 tokens/sec
2000 character limit reached

Sinkhorn Algorithm and Matrix Scaling

Updated 30 July 2025
  • The Sinkhorn algorithm is an iterative procedure for scaling nonnegative matrices to near-doubly stochastic form by alternately normalizing rows and columns.
  • It underpins applications in optimal transport, image processing, and graph matching, capitalizing on rapid empirical convergence in dense matrices.
  • The analysis reveals a sharp phase transition driven by the density parameter, with exponential convergence in high-density regimes and slower rates for sparse inputs.

The Sinkhorn algorithm is an iterative scaling procedure for transforming a nonnegative square matrix into a (nearly) doubly stochastic matrix—one where all row and column sums are 1—by alternately normalizing rows and columns. Originally introduced for the matrix scaling problem, it underpins modern computational optimal transport via its capacity to efficiently and robustly solve entropically regularized mass transportation problems. The recent theoretical analysis in "Phase transition of the Sinkhorn-Knopp algorithm" (He, 13 Jul 2025) provides tight iteration complexity bounds and uncovers a sharp phase transition in iteration behavior governed by a structural density parameter, which explains the robust practical performance observed in applied contexts.

1. Matrix Scaling and the Sinkhorn–Knopp Iterates

The Sinkhorn–Knopp algorithm operates on a nonnegative matrix ARn×nA \in \mathbb{R}^{n \times n} by alternately rescaling rows and columns to achieve prescribed sums (typically unity, for doubly stochasticity). At each iteration, row scaling is performed to set all row sums to 1, followed by column scaling to set all column sums to 1. Formally, one seeks diagonal matrices XX, YY such that B=XAYB = XAY is nearly doubly stochastic. The discrete process is continued until the maximum residual deviation from the doubly stochastic property is below a pre-set threshold ε\varepsilon.

This algorithm is foundational not only in classical matrix scaling (He, 13 Jul 2025), but also as the core iteration for solving the entropically regularized optimal transport problems prevalent in computational imaging, deep learning, and combinatorial optimization.

2. Practical Performance and Density-driven Regimes

Despite the pseudopolynomial theoretical iteration bound, the Sinkhorn–Knopp algorithm exhibits extremely fast empirical convergence in applications such as dense graph matching, statistical data normalization, image transport, and preconditioning. The key insight, rigorously established, is that these problem instances typically yield dense matrices—meaning that each row and column of the normalized input has a significant fraction γ\gamma of its entries exceeding a uniform positive threshold ρ\rho. This density structure ensures favorable convergence properties absent in worst-case sparse inputs.

Such empirical success arises because, under high density, scaling vectors remain well-conditioned and stabilization of row and column sums proceeds geometrically.

3. Sharp Iteration Complexity and the Density Phase Transition

The iteration bound for Sinkhorn–Knopp is precisely characterized in terms of the matrix's density parameter γ\gamma:

  • High-density regime (γ>1/2\gamma > 1/2): For nonnegative n×nn \times n matrices whose normalized version has density γ>1/2\gamma > 1/2 (i.e., each row/column has at least γn\lceil \gamma n \rceil entries ρ\geq \rho), the number of iterations required to attain ε\varepsilon-approximate doubly stochasticity is O(lognlogε)O(\log n - \log \varepsilon). Each iteration incurs O(n2)O(n^2) arithmetic operations, yieldings overall complexity O~(n2)\widetilde{O}(n^2), which is optimal in this regime.
  • Low-density regime (γ<1/2\gamma < 1/2): There exist matrices—constructed explicitly in the analysis—for which convergence is slow: the number of iterations is Ω(n/ε)\Omega(n/\varepsilon) in the 1\ell_1-norm or Ω~(n1/2/ε)\widetilde{\Omega}(n^{1/2}/\varepsilon) in the 2\ell_2-norm.

This establishes a phase transition at γ=1/2\gamma = 1/2; above this, the algorithm admits exponentially fast convergence in iteration count (logarithmic in both nn and 1/ε1/\varepsilon), while below this density the convergence rate deteriorates dramatically.

Density γ\gamma Iterations (Upper Bound) Iterations (Lower Bound)
γ>1/2\gamma > 1/2 O(lognlogε)O(\log n - \log \varepsilon)
γ<1/2\gamma < 1/2 Ω(n/ε)\Omega(n/\varepsilon) (1\ell_1), Ω~(n1/2/ε)\widetilde{\Omega}(n^{1/2}/\varepsilon) (2\ell_2)

The derivation employs combinatorial analysis tracking the propagation of non-negligible entries throughout the scaling process and leverages inequalities for the evolution of scaling factors.

4. Theoretical Tightness and Extremal Constructions

The provided bounds are shown to be tight up to logarithmic factors. For γ>1/2\gamma > 1/2, convergence in O(lognlogε)O(\log n - \log \varepsilon) is optimal: no faster algorithm can exist for producing a nearly stochastic scaling in all such matrices, as every entry must be "touched" in O~(n2)\widetilde{O}(n^2) total operations. For γ<1/2\gamma < 1/2, explicit matrix constructions demonstrate that the lower bound is achieved, solidifying the sharp phase transition at γ=1/2\gamma = 1/2 as a theoretical barrier for geometric convergence.

5. Significance for Applications and Algorithm Engineering

The density-based classification provides a direct explanation for the frequent observation of rapid convergence—logarithmic in both input size and numerical error—in applications involving the Sinkhorn–Knopp algorithm. Dense cost structures are ubiquitous in entropic optimal transport, image and graph problems, and permanent approximation algorithms. For these, the scaling method yields robust, scalable results matching the theoretical optimum.

Conversely, in settings where the matrix density falls below the threshold (e.g., highly sparse matrices), the performance is provably degraded, guiding practitioners to consider alternative preprocessing or scaling methods.

6. Mathematical Formulation of the Density Parameter and Scaling Evolution

The density parameter γ\gamma is formally defined: a normalized n×nn \times n matrix has density γ\gamma if there exists ρ>0\rho > 0 such that at least γn\lceil \gamma n \rceil entries in each row and column are at least ρ\rho (He, 13 Jul 2025). The analysis traces the evolution of the scaling through inequalities such as 1+xexp(x)1 + x \leq \exp(x), and via tracking the minimal and maximal values of scaling vectors, ultimately leading to the precise iteration complexity estimates.

7. Implications and Extensions

The identification of the phase transition at γ=1/2\gamma = 1/2 not only affirms the algorithm's suitability for dense problems but also provides a foundation for designing new algorithms tailored for either dense or sparse regimes. Furthermore, these findings have direct consequences for approximation algorithms for the permanent of dense $0$–$1$ matrices and broadly inform matrix scaling subroutines in linear algebra and combinatorial optimization workflows.

In summary, the Sinkhorn–Knopp algorithm exhibits a phase transition in convergence rate, governed by the matrix density parameter γ\gamma, and is both theoretically and practically optimal in the high-density regime prevalent in real-world applications (He, 13 Jul 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)