Papers
Topics
Authors
Recent
2000 character limit reached

Sinkhorn-Knopp Projection

Updated 1 January 2026
  • Sinkhorn-Knopp Projection is an iterative matrix scaling procedure that normalizes nonnegative matrices to achieve targeted row and column sums, widely used in optimal transport and network analysis.
  • It operates by alternating normalization steps viewed as KL-divergence projections, offering provable convergence rates and allowing acceleration via overrelaxation and Krylov techniques.
  • The method extends to rectangular matrices and constrained supports, making it versatile for applications in entropically-regularized transport, quantum information, and large-scale optimization.

The Sinkhorn-Knopp Projection is an iterative matrix scaling procedure that projects an arbitrary nonnegative matrix onto the set of matrices with prescribed row and column sums or, equivalently, onto the transportation polytope. It is foundational in areas such as entropically-regularized optimal transport, matrix balancing, and constrained optimal transport, where it provides a closed-form, scalable, and provably convergent method for finding doubly stochastic or targeted-marginal matrices. The projection operates via alternating normalization of rows and columns and can be understood through the lens of KL-divergence minimization, convex optimization, and duality theory. The method's convergence and complexity are well-characterized, with exact rates depending on matrix structure, and the algorithm admits generalizations including rectangular maps, overrelaxation schemes, and acceleration by Krylov subspace techniques.

1. Mathematical Foundation and Problem Formulation

The Sinkhorn-Knopp Projection seeks, for a given nonnegative matrix AR0n×mA \in \mathbb{R}_{\geq 0}^{n \times m} and prescribed positive marginals rR>0n,cR>0mr \in \mathbb{R}^n_{>0}, c \in \mathbb{R}^m_{>0} with iri=jcj\sum_i r_i = \sum_j c_j, a matrix XX of the form X=diag(u)Adiag(v)X = \operatorname{diag}(u) A \operatorname{diag}(v) such that X1m=rX \mathbf{1}_m = r and X1n=cX^\top \mathbf{1}_n = c.

In entropically-regularized optimal transport, the associated minimization is

minPU(a,b)P,CεH(P),H(P)=i,jPij(logPij1)\min_{P \in U(a,b)} \langle P, C \rangle - \varepsilon H(P), \qquad H(P) = -\sum_{i,j} P_{ij} \left( \log P_{ij} - 1 \right)

where U(a,b)U(a,b) is the set of nonnegative matrices with marginals a,ba,b and CC is a cost matrix. The minimizer PP^* has the closed-form

P=diag(u)Kdiag(v),Kij=exp(Cij/ε)P^* = \operatorname{diag}(u) K \operatorname{diag}(v), \qquad K_{ij} = \exp(-C_{ij} / \varepsilon)

and u,vu, v are positive vectors enforcing the marginal constraints (Cuturi, 2013).

The general approach is extensible to rectangular maps T:MkMmT: M_k \to M_m in the setting of positive operators, where one seeks scaling by invertible D1,D2D_1, D_2 such that D2T(D1XD1)D2D_2 T(D_1 X D_1^*) D_2^* is doubly stochastic (Cariello, 2016). Support and total support conditions on AA or TT are both necessary and sufficient for solvability.

2. Algorithmic Structure and Fixed-Point Updates

The core Sinkhorn-Knopp iteration alternates between row and column normalization: u(t+1)=r./(Av(t)),v(t+1)=c./(Au(t+1))u^{(t+1)} = r ./ (A v^{(t)}), \qquad v^{(t+1)} = c ./ (A^\top u^{(t+1)}) where './' is componentwise division. Starting with an initial v(0)>0v^{(0)}>0, these updates are repeated until convergence. The process can be expressed more generally as alternating KL-divergence projections onto affine sets of matrices with fixed row or column sums (Chakrabarty et al., 2018, Mazzilli et al., 2022): Xij(+1)=Xij()rikXik(),Xij(+1)=Xij()cjiXij()X_{ij}^{(\ell+1)} = X_{ij}^{(\ell)} \cdot \frac{r_i}{\sum_{k} X_{ik}^{(\ell)}}, \qquad X_{ij}^{(\ell+1)} = X_{ij}^{(\ell)} \cdot \frac{c_j}{\sum_{i} X_{ij}^{(\ell)}} for row and column normalization, respectively.

For matrices AA with prescribed zero patterns or additional constraints, projection iterates over the support-restricted submatrix using masked or zeroed entries (Corless et al., 2024).

A simplified pseudocode for Sinkhorn-Knopp in the setting of entropic OT:

1
2
3
4
5
6
7
8
def sinkhorn_knopp(C, a, b, epsilon, tol):
    K = np.exp(-C / epsilon)
    u = np.ones_like(a)
    v = np.ones_like(b)
    while not converged:
        u = a / (K @ v)
        v = b / (K.T @ u)
    return np.diag(u) @ K @ np.diag(v)
(Cuturi, 2013, Kushinsky et al., 2017)

3. Convergence, Complexity, and Phase Transitions

The Sinkhorn-Knopp algorithm converges to the unique (up to scaling) pair (u,v)(u^*, v^*) such that diag(u)Adiag(v)\operatorname{diag}(u^*) A \operatorname{diag}(v^*) achieves the prescribed marginals, provided AA has total support (or, more generally, is fully indecomposable) (Chan et al., 2016, Adams et al., 2011).

Convergence is geometric in Hilbert's projective metric. Explicit iteration bounds are given by (Chakrabarty et al., 2018):

  • To reach KL-divergence below δ\delta, T=(ln(1+2Δρ/ν))/δT = \lceil (\ln(1+2\Delta\rho/\nu))/\delta \rceil steps suffice,
  • For 2\ell_2-error ε\leq \varepsilon, O(ρhln(Δρ/ν)(1/ε+1/ε2))O(\rho h \ln(\Delta\rho/\nu)(1/\varepsilon + 1/\varepsilon^2)) iterations, where ρ\rho, ν\nu, Δ\Delta are element and sparsity parameters of AA.

A sharp phase transition in iteration complexity occurs at density threshold γ=1/2\gamma=1/2: for normalized matrices with γ>1/2\gamma>1/2, only O(lognlogε)O(\log n - \log\varepsilon) iterations are required to reach 1\ell_1-error ε\leq \varepsilon, whereas for γ<1/2\gamma<1/2 scaling needs Ω(n/ε)\Omega(\sqrt{n} / \varepsilon) iterations even for strictly positive matrices (He, 13 Jul 2025).

For overrelaxed variants, the iteration may be accelerated by incorporating an overrelaxation parameter ω\omega, achieving faster local rates. In the limit, Arnoldi-type methods leveraging the nonlinear eigenvector formulation exhibit further reductions in iteration count, especially when the matrix AA has clustered singular values (Aristodemo et al., 2018, 1711.01851).

4. Convex Optimization, Duality, and KL-Projections

Sinkhorn-Knopp projection arises as the solution of strictly convex separable programs involving KL-divergence. For OT, this is the unique minimizer of

minPU(a,b)P,CεH(P)\min_{P \in U(a, b)} \langle P, C \rangle - \varepsilon H(P)

and the dual is

maxα,β i,jkijexp(αiβj)+α,a+β,b\max_{\alpha, \beta}\ -\sum_{i,j} k_{ij} \exp(-\alpha_i - \beta_j) + \langle \alpha, a \rangle + \langle \beta, b \rangle

with primal solution Pij=uiKijvjP^*_{ij} = u_i\,K_{ij}\,v_j (Cuturi, 2013).

The scaling steps compute alternating Bregman projections (specifically, KL) onto affine constraint sets. In the infinite-dimensional setting of operator scaling, these appear as generalized alternating projections with diagonal premultiplication and postmultiplication (Cariello, 2016).

The KL-divergence serves as a natural potential function, sharply dropping with each projection and serving as the basis for error estimates, Pinsker's inequality lower bounds, and rates of convergence (Chakrabarty et al., 2018).

5. Applications in Optimal Transport, Learning, and Network Science

Sinkhorn-Knopp projection is now ubiquitous in large-scale optimal transport as an efficient way to compute entropic Wasserstein distances, which in turn underpins fast distributional data analysis and generative modeling (Cuturi, 2013, Kushinsky et al., 2017). The algorithm is also central for differentiable ranking via doubly-stochastic normalization in neural sequencing and retrieval (Adams et al., 2011), for matrix balancing with scalable computation, and for the analysis of network nestedness (Mazzilli et al., 2022).

In quantum information, the rectangular version yields necessary and sufficient conditions for normal forms in entanglement detection and filter normalization (Cariello, 2016). Extension to image denoising, smoothing filter symmetrization, and clustering is achieved via the interpretation of SK steps as alternating expectation-maximization for Gaussian mixture models (Chan et al., 2016).

In large optimization relaxations, such as the Johnson–Adams LP for the quadratic assignment problem, KL-projection-based generalizations of Sinkhorn-Knopp provide scalable, numerically stable solvers that outperform classic methods on high-dimensional polytopes (Kushinsky et al., 2017).

6. Generalizations and Extensions

The Sinkhorn-Knopp procedure is extensible to:

  • Rectangular matrices: generalized to non-square and operator-valued settings, enabling the balancing of positive maps with prescribed left/right "marginals" (Cariello, 2016).
  • Masked or constrained supports: zero patterns and forbidden entries are incorporated directly by setting the prior KK and maintaining zeros throughout iteration (Corless et al., 2024).
  • Overrelaxation and acceleration: Post-processing projections with an overrelaxation parameter in log-space enables SOR-type speedups with guaranteed Lyapunov decay for global convergence, and optimal local rates are achieved with heuristic eigenvalue estimates (1711.01851).
  • Arnoldi-type acceleration: Recasting the iteration as a nonlinear eigenproblem enables Krylov subspace methods and block Arnoldi techniques, producing significant empirical speedup, particularly in sparse or nearly decomposable matrices (Aristodemo et al., 2018).

7. Practical Considerations, Implementation, and Empirical Insights

Each iteration of the canonical method involves two matrix-vector multiplications and two componentwise divisions, with O(n2)O(n^2) computational complexity per step for n×nn \times n dense matrices. In practice, typical "dense" inputs converge in very few steps, justifying the method's popularity in large-scale pipelines. For structured or sparse matrices, the recommended approach is to assess density to anticipate potential slowdowns (He, 13 Jul 2025).

Convergence can be monitored via 1\ell_1 or 2\ell_2 errors in marginals, or reduction in KL-divergence. For differentiable learning, forward and backward propagation through multiple iterations is routine, enabling end-to-end gradient-based updates (Adams et al., 2011, Lu et al., 11 May 2025).

Representative pseudocode, algorithmic details, and practical observations on numerics and stopping criteria are detailed in (Cuturi, 2013, Chan et al., 2016, Lu et al., 11 May 2025).

Empirical studies confirm that the Sinkhorn-Knopp Projection is robust and efficient for massive-scale problems in optimal transport, assignment, economic metrics, object detection, and quantum state normalization, with provable guarantees provided the relevant support or total-support conditions are satisfied.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Sinkhorn-Knopp Projection.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube