Sinkhorn’s Theorem and Matrix Scaling

Updated 9 April 2026

Sinkhorn’s theorem is a fundamental result that guarantees the existence and uniqueness (up to scalar multiplication) of diagonal matrices that scale a positive matrix to prescribed row and column sums.
The associated Sinkhorn algorithm employs alternating normalization steps, providing a unique minimizer for the entropy-regularized optimal transport problem with provable convergence.
Its applications span optimal transport, quantum state normalization, and matrix preconditioning, illustrating its broad impact across computational mathematics and data science.

Sinkhorn’s theorem characterizes the diagonal scaling of positive matrices to prescribed marginals, unifying perspectives from matrix balancing, information geometry, entropic optimal transport, and quantum state normalization. The theorem underpins the Sinkhorn algorithm, a pivotal tool in computational mathematics, statistics, quantum theory, and machine learning.

1. Formal Statement and Matrix Scaling Principle

Let $K \in \mathbb{R}_{>0}^{n \times n}$ be a strictly positive matrix, and let $r, s \in \mathbb{R}_{>0}^n$ satisfy $\sum_i r_i = \sum_j s_j$ . Sinkhorn’s theorem asserts the existence of diagonal matrices $D_1 = \mathrm{diag}(a_1, \ldots, a_n)$ and $D_2 = \mathrm{diag}(b_1, \ldots, b_n)$ , unique up to positive scalar multiplication $(D_1, D_2) \mapsto (\lambda D_1, \lambda^{-1} D_2)$ , such that

$M = D_1 K D_2$

has row sums $r$ and column sums $s$ , i.e., $M 1 = r$ and $r, s \in \mathbb{R}_{>0}^n$ 0. Uniqueness holds up to the mentioned scaling.

Equivalently, the theorem yields a unique minimizer to the strictly convex entropy-regularized optimal transport problem: $r, s \in \mathbb{R}_{>0}^n$ 1 where $r, s \in \mathbb{R}_{>0}^n$ 2 denotes generalized Kullback–Leibler divergence. The solution is $r, s \in \mathbb{R}_{>0}^n$ 3.

Algorithmically, this scaling is achieved through alternate renormalizations: $r, s \in \mathbb{R}_{>0}^n$ 4 which converge to the unique pair $r, s \in \mathbb{R}_{>0}^n$ 5 realizing the target marginals (Modin, 2023, 2002.03758, Nathanson, 2018, Nathanson, 2019).

2. Geometric, Variational, and Dynamical Formulations

Sinkhorn’s theorem admits a geometric interpretation: the scaling algorithm can be seen as an alternating projection scheme (Bregman projection) in the space of coupling matrices or measures, equipped with KL divergence as the Bregman divergence.

In the discrete setting, Sinkhorn iteration alternately projects a coupling matrix onto the affine sets defined by the marginals. In the continuous setting—central to entropic optimal transport—this alternation is made in the infinite-dimensional space of product measures or densities endowed with the Fisher–Rao product metric.

The geometric view further shows the Sinkhorn iteration is a Lie–Trotter splitting (forward-Euler discretization) for a coupled, nonlinear integral equation governing the evolution of Schrödinger potentials: $r, s \in \mathbb{R}_{>0}^n$ 6 and analogously for $r, s \in \mathbb{R}_{>0}^n$ 7. The stationary points of these flows correspond to solutions of the scaled marginal equations, and each Sinkhorn update corresponds exactly to a time-discretized step of these nonlinear flows (Modin, 2023).

3. Generalizations: Beurling’s Theorem and Quantum/Rectangular Cases

The continuous analog, due to Beurling, considers product measures on a compact space $r, s \in \mathbb{R}_{>0}^n$ 8 and continuous strictly positive kernels $r, s \in \mathbb{R}_{>0}^n$ 9. Given measures $\sum_i r_i = \sum_j s_j$ 0, for any product measure $\sum_i r_i = \sum_j s_j$ 1 there exists a unique measure $\sum_i r_i = \sum_j s_j$ 2 such that $\sum_i r_i = \sum_j s_j$ 3, $\sum_i r_i = \sum_j s_j$ 4, up to co-scaling $\sum_i r_i = \sum_j s_j$ 5.

For quantum state normalization, the Sinkhorn–Knopp theorem generalizes to characterizations of positive maps $\sum_i r_i = \sum_j s_j$ 6 as being equivalent to doubly stochastic maps if and only if they have total support. In quantum information, this governs the existence of filter normal forms for bipartite states (Cariello, 2018, Cariello, 2016).

In the case of unitary matrices, any $\sum_i r_i = \sum_j s_j$ 7 admits diagonal unitary scalings to a matrix whose row and column sums are all 1, via a symplectic topology argument predicated on the non-displaceability of the Clifford torus in projective space (Idel et al., 2014).

4. Algorithmic Properties, Convergence, and Rates

The alternating scaling/sinkhorn algorithm produces sequences of matrices converging to the unique doubly stochastic scaling (Sinkhorn limit). Each row scaling is a projection onto the set of matrices with prescribed row sums; each column scaling, onto those with prescribed column sums. The process is strictly contractive in the Hilbert projective metric or via monotonic decrease in the KL divergence.

Recently, the Sinkhorn process has been shown to be a block coordinate (mirror) descent in the space of marginal densities, with explicit sublinear convergence rates. Given the “1-smoothness” of the entropy functional, one obtains a global bound: $\sum_i r_i = \sum_j s_j$ 8 where $\sum_i r_i = \sum_j s_j$ 9 is the minimum relative entropy for the scaled problem. This rate does not degrade with sparsity or small matrix entries, contrasting with classical Hilbert-metric-based rates, which depend explicitly on matrix entry ratios (2002.03758).

In special cases (e.g., certain symmetric $D_1 = \mathrm{diag}(a_1, \ldots, a_n)$ 0 matrices), the Sinkhorn limit can be given in closed form, while for two-by-two matrices, the algorithm always converges in at most two steps, and the minimal number of iterations for finite convergence is universally two, regardless of matrix size (Cohen et al., 2019, Nathanson, 2018, Nathanson, 2019).

5. Explicit Scalings, Arithmetic, and Diophantine Structure

For classes of matrices with small dimensions or symmetries, explicit expressions for the Sinkhorn limit are attainable. For $D_1 = \mathrm{diag}(a_1, \ldots, a_n)$ 1 symmetric matrices with two values, algebraic formulas involving roots are available, and the process reveals ties to Diophantine approximation and the rationality of the Sinkhorn limits. In the $D_1 = \mathrm{diag}(a_1, \ldots, a_n)$ 2 case, the Sinkhorn scaling yields matrices of the form

$D_1 = \mathrm{diag}(a_1, \ldots, a_n)$ 3

for any positive $D_1 = \mathrm{diag}(a_1, \ldots, a_n)$ 4 matrix $D_1 = \mathrm{diag}(a_1, \ldots, a_n)$ 5 (Nathanson, 2018, Nathanson, 2019, Nathanson, 2019).

Connections to number theory appear: the algorithm terminates in finitely many steps precisely when the Sinkhorn limit has rational entries, and, for certain parameter values, the convergence exhibits relations to quadratic and higher degree equations over the rationals.

6. Broader Extensions and Applications

The reach of Sinkhorn’s theorem encompasses:

Entropic regularization in optimal transport: The theorem underpins fast approximate solutions to transport problems via the Sinkhorn algorithm, with broad impact in image processing, statistics, and statistical physics (Modin, 2023).
Quantum information theory: Filter normal forms and characterization of entangled states via scaling of completely positive maps rely on quantum analogs of Sinkhorn’s theorem (Cariello, 2018, Cariello, 2016).
Linear optics and unitaries: Unitary Sinkhorn normal forms, with explicit Fourier-based decompositions, are foundational in the design of universal multiport interferometers for quantum circuits (Idel et al., 2014).
Matrix balancing and preconditioning: The Sinkhorn iteration is widely used in numerical linear algebra to precondition matrices and solve inverse problems where row and column normalization is desired.

These diverse applications flow from the deep intertwining of convex analysis, geometry, matrix theory, and optimization encoded in Sinkhorn’s theorem and its generalizations.