Sinkhorn–Knopp Matrix Scaling Algorithm
- The Sinkhorn–Knopp algorithm is an iterative method that scales nonnegative matrices to prescribed row and column sums by alternately normalizing rows and columns.
- It guarantees convergence under proper support conditions, achieving a unique scaling solution through contraction in Hilbert’s projective metric.
- The algorithm underpins efficient solvers in optimal transport, combinatorial optimization, and numerical linear algebra, with proven iteration bounds.
The Sinkhorn–Knopp matrix scaling algorithm, also known as the RAS algorithm, is a fundamental iterative procedure for scaling a nonnegative matrix so that it achieves prescribed row and column sums. Its wide impact spans combinatorial optimization, numerical linear algebra, statistics, and optimal transport. The method alternates normalizing the rows and columns of a matrix, producing a sequence converging—under suitable conditions—to a matrix which satisfies the specified marginals exactly. Beyond its original use case for doubly stochastic scaling, the Sinkhorn–Knopp algorithm underlies fast solvers for entropic regularized optimal transport and is deeply connected to geometric programming, Bregman projections, quantum algorithms, and structural properties in combinatorial matrix theory.
1. Problem Statement and Classical Algorithm
Given a nonnegative matrix , and target row and column sum vectors , such that , the matrix scaling problem is to find diagonal matrices and so that has row sums and column sums : A particularly important special case is doubly stochastic scaling: 0, 1 for all 2.
The canonical Sinkhorn–Knopp algorithm alternates between row-normalization and column-normalization:
- Initialize 3, 4
- For 5:
6
Or, equivalently, alternately scale the matrix by the relevant diagonal matrices so that in each step either the rows or the columns match the target sums (Chakrabarty et al., 2018, Idel, 2016, Cuturi, 2013, Nathanson, 2019).
The iteration is stopped when both the maximum deviation in row sums and column sums falls below a prescribed tolerance 7 (Nathanson, 2019).
2. Theoretical Guarantees: Existence, Uniqueness, and Convergence
Existence & Uniqueness:
The algorithm produces positive diagonal matrices 8, 9 such that 0 has prescribed marginals if and only if 1 has a "doubly stochastic pattern"—there exists a matrix with the same support as 2 that is already scalable to the desired marginals (Idel, 2016). For strictly positive 3, the diagonal scalings are unique up to a common scalar factor.
Convergence:
- Each scaling step is a contraction in Hilbert’s projective metric.
- Alternation of two such contractions yields linear (geometric) convergence to the unique fixed point in the positive orthant (Nathanson, 2019, Idel, 2016).
- For matrices with total support (no row/column of all zeros) and connected support graph, convergence to the unique solution (or Sinkhorn limit 4) is always achieved.
Finiteness Results:
In the generic case, the algorithm converges asymptotically. However, for certain block-structured matrices which are already row- or column-stochastic, the process may terminate in one or two iterations (Nathanson, 2019, Cohen et al., 2019). Cohen–Nathanson proved that no instance terminates exactly in more than two steps (Cohen et al., 2019).
3. Quantitative Error Analysis and Iteration Bounds
The convergence rate of the Sinkhorn–Knopp algorithm is governed by the structure of 5 and the target tolerance 6. Key results (Chakrabarty et al., 2018):
- KL-divergence Tracking:
The decrease in the KL-divergence between the current row sum distribution and the target is closely tracked at each iteration. Pinsker’s inequality and a new KL vs 7 bound connect KL divergence to 8 and 9 errors.
- Iteration Bounds:
To achieve 0 error 1,
2
steps suffice, where 3 (min nonzero entry)/(max entry), 4, and 5.
For 6 error 7,
8
suffices.
- Comparisons to Prior Bounds:
Earlier analyses had 9 dependence for both errors; improved results replace this with 0 for 1 and log-scale dependence on the matrix parameters due to the strengthened KL-related inequalities.
- Exponential and Dimension-Free Convergence:
For matrices whose normalized version satisfies an explicit density condition (bulk mass), iteration complexity is 2, independent of 3 and the maximum regularized cost (He, 4 Apr 2026, He, 13 Jul 2025). Pre-scaling by marginals enables a dimension-free 4 iteration complexity (He, 4 Apr 2026).
- Phase Transition in Complexity:
There exists a sharp phase transition at matrix density 5: above this, convergence is logarithmic in 6; below, dependence becomes polynomial in 7 and 8 (He, 13 Jul 2025, He, 4 Apr 2026).
4. Interpretations and Connections to Optimization
Variational Views:
- Bregman Projection / KL Minimization:
Alternating row and column scaling is an instance of Bregman projection (specifically, KL-projection) onto affine subspaces of fixed row and column sums. The algorithm minimizes the KL-divergence 9 under these constraints (Corless et al., 2024, Chakrabarty et al., 2018, Cuturi, 2013).
- Geometric Programming (Capacity Maximization):
The process alternates minimization of a geometric log-capacity objective, equivalent to maximizing the permanent over certain normalized classes under fixed marginals (Hayashi et al., 2022). The decrease in capacity is directly linked to progress toward feasibility via an exact formula involving KL divergence.
- Duality in Entropic Regularized Optimal Transport:
In entropic OT, diagonal scaling finds the potentials (0) in 1 solving 2, with 3.
5. Algorithmic Enhancements, Quantum Algorithms, and Practical Extensions
Algorithmic Accelerations:
- Regularization scheduling (doubling the entropic penalty periodically) and scaling reductions (cost, capacity) yield exponential convergence and optimal 4 complexity for entropic OT with integer marginals (Chen et al., 2022).
- Hierarchical low-rank and Kronecker-structured representations allow 5 complexity for large-scale regularized OT when the cost matrix decomposes favorably (Motamed, 2020).
Quantum Analogues:
- Quantum implementations of Sinkhorn (using amplitude estimation) achieve 6 scaling, a polynomial speedup in 7 at a cost in 8-dependence, and are provably optimal for constant-error scaling (Apeldoorn et al., 2020).
Extensions:
- The SK scheme generalizes to operator scaling (positive linear maps), where iteration on positive semidefinite matrices yields trace-preserving, unital normal forms with connections to quantum information theory (Idel, 2016).
- Constrained scaling with prescribed zeros ("forbidden entries") is handled by zeroing out respective entries in 9 and restricting scaling support, with convergence proven under mild support assumptions (Corless et al., 2024).
6. Combinatorial and Structural Implications
- Perfect Matching and Blockers:
For the case 0 is a 1-2 matrix, the existence of a doubly stochastic scaling is equivalent to the support graph admitting a perfect matching (via Hall's condition) (Hayashi et al., 2022). When not scalable, the limiting matrix encodes all Hall blockers (deficiency subsets), accessible from the plateau structure of the limiting marginals.
- Block Decompositions:
The Sinkhorn–Knopp limit in the non-scalable case is block upper-triangular, precisely reflecting the principal partition, polymatroid base-polytope structure, and Dulmage–Mendelsohn decomposition in combinatorial optimization.
- Explicit Formulae and Arithmetic:
For symmetric 3 matrices with two distinct entries, closed-form doubly stochastic scalings ("Sinkhorn limits") can be calculated, sometimes involving roots of low-degree equations. Finite-term scaling is structurally characterized and rare (Nathanson, 2019, Nathanson, 2019).
7. Applications and Practical Considerations
Applications:
- Preconditioning Linear Systems:
Matrix balancing as a preprocessing step increases numerical stability in large-scale linear algebra (Apeldoorn et al., 2020).
- Optimal Transport and Learning:
The SK algorithm is the workhorse for entropic OT solvers in machine learning, enabling scalable computation of Wasserstein distances for images, signals, and distributions (Cuturi, 2013, Motamed, 2020).
- Combinatorial Optimization:
Used for perfect matching, permanent approximation, and biproportional allocation in graphs and networks (Hayashi et al., 2022, He, 13 Jul 2025).
Practical Issues and Recommendations:
- Scaling matrices with extreme values (small 4 or massive entries in 5) can produce poor theoretical bounds, but practical performance is governed by the well-boundedness (bulk density) of the cost/mass, robust to outliers (He, 4 Apr 2026).
- Pre-scaling by the target marginals is recommended to improve both numerical stability and convergence rates.
- Stopping criteria based on threshold error in marginal sums or primal feasibility are standard; log-domain stabilization and exploitation of sparsity are often necessary in large-scale problems (Cuturi, 2013, Corless et al., 2024).
For foundational results, optimal iteration bounds, structural implications, and advanced algorithmic enhancements, see (Chakrabarty et al., 2018, He, 13 Jul 2025, He, 4 Apr 2026, Idel, 2016, Motamed, 2020), and (Apeldoorn et al., 2020).