Sinkhorn's Theorem & Matrix Scaling

Updated 11 April 2026

Sinkhorn's Theorem is a result that establishes when a nonnegative square matrix can be scaled by positive diagonal matrices to achieve a doubly stochastic form.
The Sinkhorn–Knopp algorithm alternately normalizes rows and columns and is proven to converge under total support conditions using metrics like the Hilbert projective metric.
Extensions of the theorem impact fields such as quantum information, entropic optimal transport, and combinatorial optimization, highlighting its broad practical significance.

A nonnegative matrix is called doubly stochastic if all of its entries are nonnegative and every row and column sums to one. Sinkhorn's Theorem characterizes when a nonnegative square matrix can be scaled by positive diagonal matrices into a doubly stochastic form and demonstrates the convergence of a natural iterative scaling algorithm. The theorem plays a central role in matrix scaling, combinatorial optimization, theoretical computer science, statistics, numerical linear algebra, and quantum information. The following sections detail the theorem's statement, algorithmic aspects, extensions, finite termination properties, and connections to broader areas.

1. Classical Statement: The Sinkhorn–Knopp Theorem

Let $A \in \mathbb{R}^{n \times n}_{\ge 0}$ be a nonnegative square matrix. The matrix $A$ is said to have total support (also called possessing a nonzero σ-diagonal) if there exists a permutation $\sigma \in S_n$ such that $\prod_{i=1}^n a_{i, \sigma(i)} > 0$ . This is equivalent to the bipartite support graph of $A$ containing a perfect matching through every edge.

Theorem (Sinkhorn–Knopp):

If $A \in \mathbb{R}^{n \times n}_{\ge 0}$ has total support, then there exist unique positive diagonal matrices $D_r, D_c$ (unique up to an overall positive scalar) such that

$S = D_r A D_c$

is doubly stochastic, i.e.,

$S_{ij} \ge 0,\quad \sum_{j=1}^n S_{ij} = 1\ \forall i,\quad \sum_{i=1}^n S_{ij} = 1\ \forall j.$

Moreover, the iterative process of alternately normalizing the rows and columns of $A$ converges entrywise to $A$ 0 (Cohen et al., 2019, Idel, 2016, Nathanson, 2018, Nathanson, 2019).

If $A$ 1 is strictly positive ( $A$ 2 for all $A$ 3), the doubly stochastic scaling is always possible and unique up to scaling.

2. Alternate Scaling Algorithm and Convergence

The classical scaling procedure, now known as the Sinkhorn–Knopp algorithm, operates as follows for an $A$ 4 matrix $A$ 5:

Initialization: Let $A$ 6.
Column Scaling: For even $A$ 7, set

$A$ 8

where $A$ 9.

Row Scaling: For odd $\sigma \in S_n$ 0, set

$\sigma \in S_n$ 1

where $\sigma \in S_n$ 2.

This iteration alternately normalizes rows and columns. Under the total support condition, the sequence $\sigma \in S_n$ 3 converges entrywise to the unique doubly stochastic limit $\sigma \in S_n$ 4.

Convergence Mechanisms: Several proof strategies are employed:

Hilbert Projective Metric Contraction: The sequence is shown to contract with a coefficient strictly less than one, ensuring linear convergence (Idel, 2016, Nathanson, 2019).
Convexity and Barrier Methods: The process minimizes the Kullback–Leibler divergence to the original matrix over the set of doubly stochastic matrices, with each scaling step yielding a strict decrease unless at a fixed point (Nathanson, 2019).
Fixed-Point Theory (Nonlinear Perron–Frobenius): The scaling process defines a nonlinear operator on the cone of positive vectors, for which Brouwer’s theorem guarantees a fixed point.

3. Generalizations and Extensions

3.1 Positive Maps and Quantum Information

Sinkhorn's theorem extends from nonnegative matrices to positive linear maps between operator algebras. Let $\sigma \in S_n$ 5 be a positive linear map. The extension employs the analogous notions of support and total support, generalized via trace evaluations on rank-one projections. When $\sigma \in S_n$ 6 and $\sigma \in S_n$ 7 are invertible, $\sigma \in S_n$ 8 is equivalent to a doubly stochastic map (under invertible pre- and post-conjugations) if and only if the appropriate notion of total support holds (Cariello, 2016, Idel, 2016).

In the case $\sigma \in S_n$ 9 and $\prod_{i=1}^n a_{i, \sigma(i)} > 0$ 0 are coprime, support suffices for equivalence. This operator-level generalization finds applications in quantum information theory, particularly in characterizing the filter normal form for bipartite quantum states.

3.2 Entropic Optimal Transport and Continuous Analogues

In the continuous setting, Sinkhorn's theorem underpins the iterative proportional fitting procedure (IPFP) for coupling probability measures with fixed marginals under entropic regularization. For Polish spaces $\prod_{i=1}^n a_{i, \sigma(i)} > 0$ 1, the IPFP alternately enforces each marginal constraint while minimizing relative entropy, converging to the unique entropic optimal transport plan (Nutz et al., 2022). Convergence in total variation can be obtained for continuous costs satisfying exponential integrability conditions.

The Schrödinger potentials associated with the limiting coupling satisfy certain functional equations and play a role analogous to scaling factors in the discrete case.

4. Finite-Step Termination and Structural Results

While generic matrices require infinite alternations to reach the doubly stochastic limit, certain special matrices terminate after finitely many steps. Cohen and Nathanson proved a sharp result:

Theorem (Cohen–Nathanson):

If the alternate scaling algorithm for a nonnegative matrix with positive row and column sums attains the doubly stochastic form in finitely many steps, it does so after at most two steps. That is, after either a row–then–column or column–then–row pair of normalizations. No strictly positive matrix requires three or more (Cohen et al., 2019, Nathanson, 2018).

For $\prod_{i=1}^n a_{i, \sigma(i)} > 0$ 2 and certain $\prod_{i=1}^n a_{i, \sigma(i)} > 0$ 3 matrices, explicit structure theorems characterize those terminating in one or two steps. Degenerate, block-constant (row-stochastic but not column-stochastic) families requiring exactly one column scaling are constructed, exhibiting the limiting phenomenon (Nathanson, 2019, Nathanson, 2018).

5. Explicit Computations and Arithmetic Structure

For certain structured matrices, especially low-dimensional and two-value symmetric $\prod_{i=1}^n a_{i, \sigma(i)} > 0$ 4 matrices, explicit closed-form expressions exist for the Sinkhorn limit. These limits are computed via algebraic elimination or solution of polynomial systems, yielding explicit dependence on the original matrix parameters (Nathanson, 2019, Nathanson, 2019). In the rational case, the entries of the scaling iterates remain rational, but limits may be irrational algebraic numbers, generating sequences with applications in Diophantine approximation.

6. Algorithmic Perspectives and Optimization Viewpoint

The Sinkhorn–Knopp algorithm is interpretable as Bregman (mirror) descent on the Kullback–Leibler divergence relative to the set of transportation matrices with prescribed marginals (2002.03758). The alternating projections viewpoint aligns each step with an I-projection (iterative proportional fitting) onto row- or column-margin affine spaces. Sublinear $\prod_{i=1}^n a_{i, \sigma(i)} > 0$ 5 convergence rate bounds are obtained in terms of the divergence to optimality, and robustness to sparsity and ill-conditioning is characterized in the Bregman framework. This optimization perspective unifies classical analyses and generalizes to online and measure-theoretic settings.

7. Further Developments and Applications

Sinkhorn's theorem and the associated matrix scaling arise in multiple domains:

Statistical Data Fitting: Iterative proportional fitting in contingency tables.
Numerical Linear Algebra: Diagonal equilibration to improve conditioning.
Transportation Problems: Balancing of flow matrices.
Quantum Information: Simplification of density operators under invertible SLOCC.
Combinatorial Optimization: Edmonds' problem and operator capacity.

The matrix scaling and operator scaling frameworks, built upon the convergence and uniqueness guarantees of Sinkhorn’s theorem, have become indispensable tools in both theoretical and applied mathematical disciplines (Idel, 2016).