Papers
Topics
Authors
Recent
Search
2000 character limit reached

Sinkhorn–Knopp Matrix Scaling Algorithm

Updated 21 April 2026
  • The Sinkhorn–Knopp algorithm is an iterative method that scales nonnegative matrices to prescribed row and column sums by alternately normalizing rows and columns.
  • It guarantees convergence under proper support conditions, achieving a unique scaling solution through contraction in Hilbert’s projective metric.
  • The algorithm underpins efficient solvers in optimal transport, combinatorial optimization, and numerical linear algebra, with proven iteration bounds.

The Sinkhorn–Knopp matrix scaling algorithm, also known as the RAS algorithm, is a fundamental iterative procedure for scaling a nonnegative matrix so that it achieves prescribed row and column sums. Its wide impact spans combinatorial optimization, numerical linear algebra, statistics, and optimal transport. The method alternates normalizing the rows and columns of a matrix, producing a sequence converging—under suitable conditions—to a matrix which satisfies the specified marginals exactly. Beyond its original use case for doubly stochastic scaling, the Sinkhorn–Knopp algorithm underlies fast solvers for entropic regularized optimal transport and is deeply connected to geometric programming, Bregman projections, quantum algorithms, and structural properties in combinatorial matrix theory.

1. Problem Statement and Classical Algorithm

Given a nonnegative matrix AR0n×mA \in \mathbb{R}^{n \times m}_{\ge 0}, and target row and column sum vectors rR+nr \in \mathbb{R}^n_{+}, cR+mc \in \mathbb{R}^m_{+} such that iri=jcj=h\sum_i r_i = \sum_j c_j = h, the matrix scaling problem is to find diagonal matrices RR+n×nR \in \mathbb{R}^{n \times n}_+ and SR+m×mS \in \mathbb{R}^{m \times m}_+ so that X:=RASX := R A S has row sums rr and column sums cc: jXij=rii,iXij=cjj\sum_j X_{ij} = r_i \quad \forall i, \qquad \sum_i X_{ij} = c_j \quad \forall j A particularly important special case is doubly stochastic scaling: rR+nr \in \mathbb{R}^n_{+}0, rR+nr \in \mathbb{R}^n_{+}1 for all rR+nr \in \mathbb{R}^n_{+}2.

The canonical Sinkhorn–Knopp algorithm alternates between row-normalization and column-normalization:

  • Initialize rR+nr \in \mathbb{R}^n_{+}3, rR+nr \in \mathbb{R}^n_{+}4
  • For rR+nr \in \mathbb{R}^n_{+}5:

rR+nr \in \mathbb{R}^n_{+}6

Or, equivalently, alternately scale the matrix by the relevant diagonal matrices so that in each step either the rows or the columns match the target sums (Chakrabarty et al., 2018, Idel, 2016, Cuturi, 2013, Nathanson, 2019).

The iteration is stopped when both the maximum deviation in row sums and column sums falls below a prescribed tolerance rR+nr \in \mathbb{R}^n_{+}7 (Nathanson, 2019).

2. Theoretical Guarantees: Existence, Uniqueness, and Convergence

Existence & Uniqueness:

The algorithm produces positive diagonal matrices rR+nr \in \mathbb{R}^n_{+}8, rR+nr \in \mathbb{R}^n_{+}9 such that cR+mc \in \mathbb{R}^m_{+}0 has prescribed marginals if and only if cR+mc \in \mathbb{R}^m_{+}1 has a "doubly stochastic pattern"—there exists a matrix with the same support as cR+mc \in \mathbb{R}^m_{+}2 that is already scalable to the desired marginals (Idel, 2016). For strictly positive cR+mc \in \mathbb{R}^m_{+}3, the diagonal scalings are unique up to a common scalar factor.

Convergence:

  • Each scaling step is a contraction in Hilbert’s projective metric.
  • Alternation of two such contractions yields linear (geometric) convergence to the unique fixed point in the positive orthant (Nathanson, 2019, Idel, 2016).
  • For matrices with total support (no row/column of all zeros) and connected support graph, convergence to the unique solution (or Sinkhorn limit cR+mc \in \mathbb{R}^m_{+}4) is always achieved.

Finiteness Results:

In the generic case, the algorithm converges asymptotically. However, for certain block-structured matrices which are already row- or column-stochastic, the process may terminate in one or two iterations (Nathanson, 2019, Cohen et al., 2019). Cohen–Nathanson proved that no instance terminates exactly in more than two steps (Cohen et al., 2019).

3. Quantitative Error Analysis and Iteration Bounds

The convergence rate of the Sinkhorn–Knopp algorithm is governed by the structure of cR+mc \in \mathbb{R}^m_{+}5 and the target tolerance cR+mc \in \mathbb{R}^m_{+}6. Key results (Chakrabarty et al., 2018):

  • KL-divergence Tracking:

The decrease in the KL-divergence between the current row sum distribution and the target is closely tracked at each iteration. Pinsker’s inequality and a new KL vs cR+mc \in \mathbb{R}^m_{+}7 bound connect KL divergence to cR+mc \in \mathbb{R}^m_{+}8 and cR+mc \in \mathbb{R}^m_{+}9 errors.

  • Iteration Bounds:

To achieve iri=jcj=h\sum_i r_i = \sum_j c_j = h0 error iri=jcj=h\sum_i r_i = \sum_j c_j = h1,

iri=jcj=h\sum_i r_i = \sum_j c_j = h2

steps suffice, where iri=jcj=h\sum_i r_i = \sum_j c_j = h3 (min nonzero entry)/(max entry), iri=jcj=h\sum_i r_i = \sum_j c_j = h4, and iri=jcj=h\sum_i r_i = \sum_j c_j = h5.

For iri=jcj=h\sum_i r_i = \sum_j c_j = h6 error iri=jcj=h\sum_i r_i = \sum_j c_j = h7,

iri=jcj=h\sum_i r_i = \sum_j c_j = h8

suffices.

  • Comparisons to Prior Bounds:

Earlier analyses had iri=jcj=h\sum_i r_i = \sum_j c_j = h9 dependence for both errors; improved results replace this with RR+n×nR \in \mathbb{R}^{n \times n}_+0 for RR+n×nR \in \mathbb{R}^{n \times n}_+1 and log-scale dependence on the matrix parameters due to the strengthened KL-related inequalities.

  • Exponential and Dimension-Free Convergence:

For matrices whose normalized version satisfies an explicit density condition (bulk mass), iteration complexity is RR+n×nR \in \mathbb{R}^{n \times n}_+2, independent of RR+n×nR \in \mathbb{R}^{n \times n}_+3 and the maximum regularized cost (He, 4 Apr 2026, He, 13 Jul 2025). Pre-scaling by marginals enables a dimension-free RR+n×nR \in \mathbb{R}^{n \times n}_+4 iteration complexity (He, 4 Apr 2026).

  • Phase Transition in Complexity:

There exists a sharp phase transition at matrix density RR+n×nR \in \mathbb{R}^{n \times n}_+5: above this, convergence is logarithmic in RR+n×nR \in \mathbb{R}^{n \times n}_+6; below, dependence becomes polynomial in RR+n×nR \in \mathbb{R}^{n \times n}_+7 and RR+n×nR \in \mathbb{R}^{n \times n}_+8 (He, 13 Jul 2025, He, 4 Apr 2026).

4. Interpretations and Connections to Optimization

Variational Views:

  • Bregman Projection / KL Minimization:

Alternating row and column scaling is an instance of Bregman projection (specifically, KL-projection) onto affine subspaces of fixed row and column sums. The algorithm minimizes the KL-divergence RR+n×nR \in \mathbb{R}^{n \times n}_+9 under these constraints (Corless et al., 2024, Chakrabarty et al., 2018, Cuturi, 2013).

  • Geometric Programming (Capacity Maximization):

The process alternates minimization of a geometric log-capacity objective, equivalent to maximizing the permanent over certain normalized classes under fixed marginals (Hayashi et al., 2022). The decrease in capacity is directly linked to progress toward feasibility via an exact formula involving KL divergence.

In entropic OT, diagonal scaling finds the potentials (SR+m×mS \in \mathbb{R}^{m \times m}_+0) in SR+m×mS \in \mathbb{R}^{m \times m}_+1 solving SR+m×mS \in \mathbb{R}^{m \times m}_+2, with SR+m×mS \in \mathbb{R}^{m \times m}_+3.

5. Algorithmic Enhancements, Quantum Algorithms, and Practical Extensions

Algorithmic Accelerations:

  • Regularization scheduling (doubling the entropic penalty periodically) and scaling reductions (cost, capacity) yield exponential convergence and optimal SR+m×mS \in \mathbb{R}^{m \times m}_+4 complexity for entropic OT with integer marginals (Chen et al., 2022).
  • Hierarchical low-rank and Kronecker-structured representations allow SR+m×mS \in \mathbb{R}^{m \times m}_+5 complexity for large-scale regularized OT when the cost matrix decomposes favorably (Motamed, 2020).

Quantum Analogues:

  • Quantum implementations of Sinkhorn (using amplitude estimation) achieve SR+m×mS \in \mathbb{R}^{m \times m}_+6 scaling, a polynomial speedup in SR+m×mS \in \mathbb{R}^{m \times m}_+7 at a cost in SR+m×mS \in \mathbb{R}^{m \times m}_+8-dependence, and are provably optimal for constant-error scaling (Apeldoorn et al., 2020).

Extensions:

  • The SK scheme generalizes to operator scaling (positive linear maps), where iteration on positive semidefinite matrices yields trace-preserving, unital normal forms with connections to quantum information theory (Idel, 2016).
  • Constrained scaling with prescribed zeros ("forbidden entries") is handled by zeroing out respective entries in SR+m×mS \in \mathbb{R}^{m \times m}_+9 and restricting scaling support, with convergence proven under mild support assumptions (Corless et al., 2024).

6. Combinatorial and Structural Implications

  • Perfect Matching and Blockers:

For the case X:=RASX := R A S0 is a X:=RASX := R A S1-X:=RASX := R A S2 matrix, the existence of a doubly stochastic scaling is equivalent to the support graph admitting a perfect matching (via Hall's condition) (Hayashi et al., 2022). When not scalable, the limiting matrix encodes all Hall blockers (deficiency subsets), accessible from the plateau structure of the limiting marginals.

  • Block Decompositions:

The Sinkhorn–Knopp limit in the non-scalable case is block upper-triangular, precisely reflecting the principal partition, polymatroid base-polytope structure, and Dulmage–Mendelsohn decomposition in combinatorial optimization.

  • Explicit Formulae and Arithmetic:

For symmetric X:=RASX := R A S3 matrices with two distinct entries, closed-form doubly stochastic scalings ("Sinkhorn limits") can be calculated, sometimes involving roots of low-degree equations. Finite-term scaling is structurally characterized and rare (Nathanson, 2019, Nathanson, 2019).

7. Applications and Practical Considerations

Applications:

  • Preconditioning Linear Systems:

Matrix balancing as a preprocessing step increases numerical stability in large-scale linear algebra (Apeldoorn et al., 2020).

  • Optimal Transport and Learning:

The SK algorithm is the workhorse for entropic OT solvers in machine learning, enabling scalable computation of Wasserstein distances for images, signals, and distributions (Cuturi, 2013, Motamed, 2020).

  • Combinatorial Optimization:

Used for perfect matching, permanent approximation, and biproportional allocation in graphs and networks (Hayashi et al., 2022, He, 13 Jul 2025).

Practical Issues and Recommendations:

  • Scaling matrices with extreme values (small X:=RASX := R A S4 or massive entries in X:=RASX := R A S5) can produce poor theoretical bounds, but practical performance is governed by the well-boundedness (bulk density) of the cost/mass, robust to outliers (He, 4 Apr 2026).
  • Pre-scaling by the target marginals is recommended to improve both numerical stability and convergence rates.
  • Stopping criteria based on threshold error in marginal sums or primal feasibility are standard; log-domain stabilization and exploitation of sparsity are often necessary in large-scale problems (Cuturi, 2013, Corless et al., 2024).

For foundational results, optimal iteration bounds, structural implications, and advanced algorithmic enhancements, see (Chakrabarty et al., 2018, He, 13 Jul 2025, He, 4 Apr 2026, Idel, 2016, Motamed, 2020), and (Apeldoorn et al., 2020).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sinkhorn–Knopp Matrix Scaling Algorithm.