Sinkhorn-Knopp Centering

Updated 25 November 2025

Sinkhorn-Knopp centering is a matrix balancing technique that transforms a nonnegative matrix into a doubly stochastic matrix using alternating row and column normalizations.
The method connects to entropically regularized optimal transport, Bregman projections, and nonlinear eigenproblems, ensuring computational efficiency and linear convergence.
Extensions like overrelaxation, log-domain stabilization, and accelerated schemes enable practical applications in large-scale and constrained settings.

The Sinkhorn–Knopp centering process is a matrix balancing procedure that transforms any nonnegative matrix, typically with total support, into a doubly stochastic matrix via alternating row and column normalizations. The method has profound connections with entropy-regularized optimal transport (OT), regularized assignment problems, Bregman projection theory, and nonlinear eigenproblems. Its computational efficiency, convergence guarantees, and extensions to constrained and accelerated formulations have made it a standard tool in modern computational mathematics.

1. Mathematical Formulation and Origins

Sinkhorn–Knopp centering addresses the following canonical problem: Given a nonnegative matrix $A \in \mathbb{R}^{n \times n}$ with total support, find diagonal scaling matrices $D_r = \mathrm{diag}(r_1,\ldots,r_n) > 0$ and $D_c = \mathrm{diag}(c_1,\ldots,c_n) > 0$ such that $X = D_r\, A\, D_c$ is doubly stochastic, i.e., all row and column sums equal one. Sinkhorn’s theorem asserts the existence and uniqueness (up to a scalar) of such scalings whenever $A$ is fully indecomposable (Sharify et al., 2011, Cuturi, 2013).

This centering procedure arose from foundational work of Sinkhorn & Knopp (1967), further underpinned by Birkhoff’s contraction theorem in Hilbert’s projective metric, which enables linear convergence guarantees. In modern contexts, the centering operation is tightly linked to regularized optimal transport—specifically, the solution of entropically penalized Kantorovich problems where the transport matrix assumes a Gibbs-like scaling form (Cuturi, 2013, Modin, 2023).

2. Entropically Regularized Optimal Transport Connection

The classical optimal transport problem seeks the minimal-cost plan $\min_{P \geq 0}\ \langle P, M \rangle$ for histograms $a, b \in \Sigma_d$ and cost matrix $M \in \mathbb{R}^{d \times d}_+$ , subject to marginal constraints $P1 = a,\ P^\top 1 = b$ . Its entropically regularized variant introduces the negative-entropy penalty $h(P) = -\sum_{ij} P_{ij} \log P_{ij}$ , yielding a strictly convex program:

$\min_{P \in U(a,b)} \langle P, M \rangle - \frac{1}{\lambda} h(P)$

The unique optimal plan has factorized form $P^\lambda = \mathrm{diag}(u)\,K\,\mathrm{diag}(v)$ with $K_{ij} = \exp(-\lambda M_{ij})$ . The scaling vectors $u,v$ enforce the desired marginals via the coupled equations:

$u_{i} = \frac{a_{i}}{(K v)_{i}}, \quad v_{j} = \frac{b_{j}}{(K^\top u)_{j}}$

This recovers the direct link between matrix scaling and entropy-regularized OT (Cuturi, 2013, Modin, 2023).

3. Iterative Alternating Matrix-Scaling Algorithm

The Sinkhorn–Knopp algorithm alternates row and column normalizations to enforce prescribed sums. The core recurrences for the scaling vectors are: $u^{(k+1)} = a / (K v^{(k)}), \quad v^{(k+1)} = b / (K^\top u^{(k+1)})$ where all operations are performed elementwise. In the nonnegative matrix centering context, for any $A$ (with total support), the equivalent recursions on $A^p$ (a deformation for regularization parameter $p$ ) are: $u^{(k+1)} = 1 / (A^p v^{(k)}), \quad v^{(k+1)} = 1 / ((A^p)^\top u^{(k+1)})$ The iterates converge to scalings defining a doubly stochastic matrix $X^* = D_r^*\,A^p\,D_c^*$ (Sharify et al., 2011, Cuturi, 2013). The balanced product structure (Gibbs scaling) is universally applicable; even in non-square matrices, the algorithm generalizes, enforcing prescribed sums for rectangular structure (Cuturi, 2013, Corless et al., 16 Feb 2024).

4. Geometric, Bregman, and Dynamical System Perspectives

Recent developments view Sinkhorn–Knopp centering as a sequence of alternating Bregman (relative entropy/KL divergence) projections onto affine sets defined by the marginal constraints (Modin, 2023, Corless et al., 16 Feb 2024). At each iteration, the projection updates optimize

$\min_{\pi\in \mathrm{affine~set}} \mathrm{KL}(\pi \| \text{reference}),$

subject to linear constraints—fundamentally, this is the iterative proportional fitting procedure. Dynamically, the scheme can be derived as a time-discretization (splitting) of a continuous Fisher–Rao gradient flow, with the scaling vectors interpreted as log-potentials in an integral-equation system. Trotter–Euler splitting and stability analysis of the underlying ODE illuminate both the rapid linear convergence and extensions such as overrelaxation and acceleration (Modin, 2023).

5. Acceleration and Large-scale Implementation Techniques

Standard Sinkhorn–Knopp iteration converges with a geometric rate governed by the second-largest singular value $\sigma_2(X^*)$ of the limiting matrix—typically $O(\sigma_2^2)$ per iteration (Sharify et al., 2011, Aristodemo et al., 2018). However, in matrices with clustered dominant eigenvalues or nearly decomposable structure, convergence can degrade drastically. Recognizing that the fixed-point condition is equivalent to a nonlinear eigenvalue problem for the Jacobian of the iteration map $T$ , Arnoldi-type and power-method-inspired accelerations have been developed (Aristodemo et al., 2018). In these schemes, the inner loop approximates the dominant eigenvector, dramatically improving convergence—especially in large sparse problems typical of graph balancing or contingency table fitting.

For entropic regularization and high powers, numerical overflow/underflow can impede scaling. Stabilization in the log-domain (log-sum-exp), prescaling, and GPU-parallelization are standard remedies (Cuturi, 2013, Sharify et al., 2011). The matrix–vector structure of row/column normalizations is ideally suited to parallel hardware, supporting scaling to millions of entries.

6. Overrelaxation and Parameter Optimization

Overrelaxed Sinkhorn–Knopp updates introduce a relaxation parameter $\omega > 0$ : $u_{\ell+1} = u_{\ell}^{1-\omega} \odot \left(\frac{a}{K v_\ell} \right)^{\omega}, \quad v_{\ell+1} = v_{\ell}^{1-\omega} \odot \left( \frac{b}{K^\top u_{\ell+1}} \right)^{\omega}$ where $\odot$ denotes Hadamard product. The optimal parameter range is $0<\omega<2/(1+\Lambda(K))$ , where $\Lambda(K)$ is the Birkhoff contraction ratio. Spectral analysis reveals that for moderate $\omega > 1$ , local convergence rate is strictly improved. The optimal value is $\omega^\mathrm{opt} = 2/(1+\sqrt{1-\mu_2})$ , where $\mu_2$ is the second eigenvalue of the linearized block operator (Lehmann et al., 2020).

A practical, zero-cost heuristic for selecting $\omega$ involves running baseline Sinkhorn iterations, empirically estimating the contraction rate, and updating $\omega$ according to the spectral formula. This yields significant speedup in ill-conditioned or high-dimensional scenarios.

7. Extensions: Constraints, Assignment, and Applications

Sinkhorn–Knopp centering admits natural extensions to constrained optimal transport with prescribed zero patterns. For applications such as electrical vehicle charging, where specific assignments are disallowed, the iterative scaling simply ignores the constrained zero entries (by setting $K_{ij}=0$ for forbidden $(i,j)$ ), and the alternating Bregman projection argument still guarantees global convergence to the entropic optimum (Corless et al., 16 Feb 2024). In unbalanced OT, softened constraints are enforced via KL penalties, and the scaling exponent is correspondingly adjusted.

In the limit $p \to \infty$ for assignment problems, the entropy term vanishes, and the scaled matrix rapidly approaches the optimal permutation matrix. This behavior underpins effective preprocessing: aggressive thresholding of suboptimal entries yields a reduced assignment instance while preserving approximate optimality (Sharify et al., 2011). Applications include statistical contingency-table fitting, preconditioning in graph algorithms, and balancing of large sparse matrices in numerical analysis (Cuturi, 2013).

Sinkhorn–Knopp centering thus defines a unifying algorithmic and geometric framework for matrix balancing, entropic regularization, scalable assignment, and constrained transport. Its foundations in convexity, alternated projections, and nonlinear dynamics provide deep theoretical guarantees and rich flexibility in practice.