Papers
Topics
Authors
Recent
2000 character limit reached

Sinkhorn-Knopp Centering

Updated 25 November 2025
  • Sinkhorn-Knopp centering is a matrix balancing technique that transforms a nonnegative matrix into a doubly stochastic matrix using alternating row and column normalizations.
  • The method connects to entropically regularized optimal transport, Bregman projections, and nonlinear eigenproblems, ensuring computational efficiency and linear convergence.
  • Extensions like overrelaxation, log-domain stabilization, and accelerated schemes enable practical applications in large-scale and constrained settings.

The Sinkhorn–Knopp centering process is a matrix balancing procedure that transforms any nonnegative matrix, typically with total support, into a doubly stochastic matrix via alternating row and column normalizations. The method has profound connections with entropy-regularized optimal transport (OT), regularized assignment problems, Bregman projection theory, and nonlinear eigenproblems. Its computational efficiency, convergence guarantees, and extensions to constrained and accelerated formulations have made it a standard tool in modern computational mathematics.

1. Mathematical Formulation and Origins

Sinkhorn–Knopp centering addresses the following canonical problem: Given a nonnegative matrix ARn×nA \in \mathbb{R}^{n \times n} with total support, find diagonal scaling matrices Dr=diag(r1,,rn)>0D_r = \mathrm{diag}(r_1,\ldots,r_n) > 0 and Dc=diag(c1,,cn)>0D_c = \mathrm{diag}(c_1,\ldots,c_n) > 0 such that X=DrADcX = D_r\, A\, D_c is doubly stochastic, i.e., all row and column sums equal one. Sinkhorn’s theorem asserts the existence and uniqueness (up to a scalar) of such scalings whenever AA is fully indecomposable (Sharify et al., 2011, Cuturi, 2013).

This centering procedure arose from foundational work of Sinkhorn & Knopp (1967), further underpinned by Birkhoff’s contraction theorem in Hilbert’s projective metric, which enables linear convergence guarantees. In modern contexts, the centering operation is tightly linked to regularized optimal transport—specifically, the solution of entropically penalized Kantorovich problems where the transport matrix assumes a Gibbs-like scaling form (Cuturi, 2013, Modin, 2023).

2. Entropically Regularized Optimal Transport Connection

The classical optimal transport problem seeks the minimal-cost plan minP0 P,M\min_{P \geq 0}\ \langle P, M \rangle for histograms a,bΣda, b \in \Sigma_d and cost matrix MR+d×dM \in \mathbb{R}^{d \times d}_+, subject to marginal constraints P1=a, P1=bP1 = a,\ P^\top 1 = b. Its entropically regularized variant introduces the negative-entropy penalty h(P)=ijPijlogPijh(P) = -\sum_{ij} P_{ij} \log P_{ij}, yielding a strictly convex program:

minPU(a,b)P,M1λh(P)\min_{P \in U(a,b)} \langle P, M \rangle - \frac{1}{\lambda} h(P)

The unique optimal plan has factorized form Pλ=diag(u)Kdiag(v)P^\lambda = \mathrm{diag}(u)\,K\,\mathrm{diag}(v) with Kij=exp(λMij)K_{ij} = \exp(-\lambda M_{ij}). The scaling vectors u,vu,v enforce the desired marginals via the coupled equations:

ui=ai(Kv)i,vj=bj(Ku)ju_{i} = \frac{a_{i}}{(K v)_{i}}, \quad v_{j} = \frac{b_{j}}{(K^\top u)_{j}}

This recovers the direct link between matrix scaling and entropy-regularized OT (Cuturi, 2013, Modin, 2023).

3. Iterative Alternating Matrix-Scaling Algorithm

The Sinkhorn–Knopp algorithm alternates row and column normalizations to enforce prescribed sums. The core recurrences for the scaling vectors are: u(k+1)=a/(Kv(k)),v(k+1)=b/(Ku(k+1))u^{(k+1)} = a / (K v^{(k)}), \quad v^{(k+1)} = b / (K^\top u^{(k+1)}) where all operations are performed elementwise. In the nonnegative matrix centering context, for any AA (with total support), the equivalent recursions on ApA^p (a deformation for regularization parameter pp) are: u(k+1)=1/(Apv(k)),v(k+1)=1/((Ap)u(k+1))u^{(k+1)} = 1 / (A^p v^{(k)}), \quad v^{(k+1)} = 1 / ((A^p)^\top u^{(k+1)}) The iterates converge to scalings defining a doubly stochastic matrix X=DrApDcX^* = D_r^*\,A^p\,D_c^* (Sharify et al., 2011, Cuturi, 2013). The balanced product structure (Gibbs scaling) is universally applicable; even in non-square matrices, the algorithm generalizes, enforcing prescribed sums for rectangular structure (Cuturi, 2013, Corless et al., 16 Feb 2024).

4. Geometric, Bregman, and Dynamical System Perspectives

Recent developments view Sinkhorn–Knopp centering as a sequence of alternating Bregman (relative entropy/KL divergence) projections onto affine sets defined by the marginal constraints (Modin, 2023, Corless et al., 16 Feb 2024). At each iteration, the projection updates optimize

minπaffine setKL(πreference),\min_{\pi\in \mathrm{affine~set}} \mathrm{KL}(\pi \| \text{reference}),

subject to linear constraints—fundamentally, this is the iterative proportional fitting procedure. Dynamically, the scheme can be derived as a time-discretization (splitting) of a continuous Fisher–Rao gradient flow, with the scaling vectors interpreted as log-potentials in an integral-equation system. Trotter–Euler splitting and stability analysis of the underlying ODE illuminate both the rapid linear convergence and extensions such as overrelaxation and acceleration (Modin, 2023).

5. Acceleration and Large-scale Implementation Techniques

Standard Sinkhorn–Knopp iteration converges with a geometric rate governed by the second-largest singular value σ2(X)\sigma_2(X^*) of the limiting matrix—typically O(σ22)O(\sigma_2^2) per iteration (Sharify et al., 2011, Aristodemo et al., 2018). However, in matrices with clustered dominant eigenvalues or nearly decomposable structure, convergence can degrade drastically. Recognizing that the fixed-point condition is equivalent to a nonlinear eigenvalue problem for the Jacobian of the iteration map TT, Arnoldi-type and power-method-inspired accelerations have been developed (Aristodemo et al., 2018). In these schemes, the inner loop approximates the dominant eigenvector, dramatically improving convergence—especially in large sparse problems typical of graph balancing or contingency table fitting.

For entropic regularization and high powers, numerical overflow/underflow can impede scaling. Stabilization in the log-domain (log-sum-exp), prescaling, and GPU-parallelization are standard remedies (Cuturi, 2013, Sharify et al., 2011). The matrix–vector structure of row/column normalizations is ideally suited to parallel hardware, supporting scaling to millions of entries.

6. Overrelaxation and Parameter Optimization

Overrelaxed Sinkhorn–Knopp updates introduce a relaxation parameter ω>0\omega > 0: u+1=u1ω(aKv)ω,v+1=v1ω(bKu+1)ωu_{\ell+1} = u_{\ell}^{1-\omega} \odot \left(\frac{a}{K v_\ell} \right)^{\omega}, \quad v_{\ell+1} = v_{\ell}^{1-\omega} \odot \left( \frac{b}{K^\top u_{\ell+1}} \right)^{\omega} where \odot denotes Hadamard product. The optimal parameter range is 0<ω<2/(1+Λ(K))0<\omega<2/(1+\Lambda(K)), where Λ(K)\Lambda(K) is the Birkhoff contraction ratio. Spectral analysis reveals that for moderate ω>1\omega > 1, local convergence rate is strictly improved. The optimal value is ωopt=2/(1+1μ2)\omega^\mathrm{opt} = 2/(1+\sqrt{1-\mu_2}), where μ2\mu_2 is the second eigenvalue of the linearized block operator (Lehmann et al., 2020).

A practical, zero-cost heuristic for selecting ω\omega involves running baseline Sinkhorn iterations, empirically estimating the contraction rate, and updating ω\omega according to the spectral formula. This yields significant speedup in ill-conditioned or high-dimensional scenarios.

7. Extensions: Constraints, Assignment, and Applications

Sinkhorn–Knopp centering admits natural extensions to constrained optimal transport with prescribed zero patterns. For applications such as electrical vehicle charging, where specific assignments are disallowed, the iterative scaling simply ignores the constrained zero entries (by setting Kij=0K_{ij}=0 for forbidden (i,j)(i,j)), and the alternating Bregman projection argument still guarantees global convergence to the entropic optimum (Corless et al., 16 Feb 2024). In unbalanced OT, softened constraints are enforced via KL penalties, and the scaling exponent is correspondingly adjusted.

In the limit pp \to \infty for assignment problems, the entropy term vanishes, and the scaled matrix rapidly approaches the optimal permutation matrix. This behavior underpins effective preprocessing: aggressive thresholding of suboptimal entries yields a reduced assignment instance while preserving approximate optimality (Sharify et al., 2011). Applications include statistical contingency-table fitting, preconditioning in graph algorithms, and balancing of large sparse matrices in numerical analysis (Cuturi, 2013).


Sinkhorn–Knopp centering thus defines a unifying algorithmic and geometric framework for matrix balancing, entropic regularization, scalable assignment, and constrained transport. Its foundations in convexity, alternated projections, and nonlinear dynamics provide deep theoretical guarantees and rich flexibility in practice.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Sinkhorn-Knopp Centering.