Papers
Topics
Authors
Recent
Search
2000 character limit reached

Sinkhorn–Knopp Centering Process

Updated 6 May 2026
  • The Sinkhorn–Knopp centering process is an iterative algorithm that scales nonnegative matrices to prescribed row and column sums using diagonal normalization.
  • It leverages convex optimization and Bregman projections to achieve rapid convergence and ensure unique scaling under broad regularity conditions.
  • The process underpins applications in entropic optimal transport, matrix balancing, and machine learning, and can be adapted for sparse or high-dimensional data.

The Sinkhorn–Knopp centering process—also known as matrix balancing, matrix scaling, or row/column normalization—is an iterative algorithm to compute positive diagonal scalings of a nonnegative matrix so that the resulting matrix achieves prescribed row and column marginals. It is fundamental to entropic regularization for discrete optimal transport, matrix balancing, and doubly stochasticization problems. The process combines principles from convex optimization, geometric analysis, and dynamical systems, and exhibits rapid convergence under broad regularity conditions (Modin, 2023).

1. Problem Statement and Matrix-Scaling Formulation

Given a strictly positive matrix KR+n×nK \in \mathbb{R}_+^{n \times n} and two strictly positive vectors a,bR++na, b \in \mathbb{R}^n_{++}, the Sinkhorn–Knopp problem is to find diagonal scaling matrices diag(u)\operatorname{diag}(u) and diag(v)\operatorname{diag}(v) such that

P=diag(u)Kdiag(v)P = \operatorname{diag}(u) K \operatorname{diag}(v)

satisfies

P1=a,P1=bP \mathbb{1} = a, \qquad P^\top \mathbb{1} = b

that is, the row sums of PP equal aa and the column sums equal bb (Modin, 2023, Brauer et al., 2017, Cuturi, 2013). In component form,

uiKijvju_i K_{ij} v_j

must sum to a,bR++na, b \in \mathbb{R}^n_{++}0 over a,bR++na, b \in \mathbb{R}^n_{++}1 and to a,bR++na, b \in \mathbb{R}^n_{++}2 over a,bR++na, b \in \mathbb{R}^n_{++}3. This formulation includes the classic doubly stochastic scaling as the special case a,bR++na, b \in \mathbb{R}^n_{++}4.

The process also arises in entropic optimal transport, where the problem is to minimize the cost

a,bR++na, b \in \mathbb{R}^n_{++}5

subject to prescribed marginals, with a,bR++na, b \in \mathbb{R}^n_{++}6 as the Gibbs kernel (Brauer et al., 2017, Cuturi, 2013).

2. Iterative Centering Algorithm

The Sinkhorn–Knopp iteration alternates between normalizing to enforce the row and column marginal constraints. In vector form, the updates are

a,bR++na, b \in \mathbb{R}^n_{++}7

where division is entrywise (Modin, 2023, Brauer et al., 2017, Chakrabarty et al., 2018). The initial vectors a,bR++na, b \in \mathbb{R}^n_{++}8 and a,bR++na, b \in \mathbb{R}^n_{++}9 can be set to all-ones or any strictly positive values, which ensures positivity is preserved throughout the iterations.

Each iterate diag(u)\operatorname{diag}(u)0 alternately matches the row or column marginals, and the iteration converges to a unique scaling (up to rescaling) when diag(u)\operatorname{diag}(u)1 and diag(u)\operatorname{diag}(u)2.

Pseudocode of the classical Sinkhorn–Knopp process is as follows:

P1=a,P1=bP \mathbb{1} = a, \qquad P^\top \mathbb{1} = b6 (Modin, 2023, Brauer et al., 2017)

3. Geometric and Dynamical Interpretation

The Sinkhorn–Knopp algorithm can be viewed geometrically as alternating Bregman projections (specifically, Kullback–Leibler projections) onto the affine sets of matrices with prescribed row and then column sums within the transport polytope diag(u)\operatorname{diag}(u)3 (Modin, 2023, 1711.01851).

From a continuous perspective, the Sinkhorn iteration discretizes a coupled nonlinear gradient flow on densities—specifically, a time-splitting of the integral equations

diag(u)\operatorname{diag}(u)4

with diag(u)\operatorname{diag}(u)5 representing heat kernel convolution and diag(u)\operatorname{diag}(u)6 the prescribed marginals in continuous space (Modin, 2023).

Under the Benamou–Brenier entropic regularization, a dynamical (continuous-time ODE) formulation yields

diag(u)\operatorname{diag}(u)7

and Lie–Trotter splitting (with explicit Euler discretization) matches the discrete SK updates (Modin, 2023).

The projection viewpoint connects the SK process to alternating minimization of KL divergence and underpins interpretability in terms of metric contraction and convex geometry (Chakrabarty et al., 2018, 1711.01851).

4. Convergence Analysis and Rate

Sinkhorn and Knopp originally proved that, for any positive diag(u)\operatorname{diag}(u)8 and positive marginals, the alternated diagonal scaling converges to a unique solution (Modin, 2023, Nathanson, 2019). The convergence is geometric: the error in the marginals (as measured by, e.g., diag(u)\operatorname{diag}(u)9, diag(v)\operatorname{diag}(v)0, or Kullback–Leibler divergence) contracts at a rate determined by the Hilbert projective metric contraction ratio for the scaling map (Modin, 2023, Chakrabarty et al., 2018).

The precise iteration complexity depends on structure:

  • For dense, well-bounded, or high-density matrices (diag(v)\operatorname{diag}(v)1), the iteration count is diag(v)\operatorname{diag}(v)2 to reach error diag(v)\operatorname{diag}(v)3 (He, 13 Jul 2025, He, 4 Apr 2026).
  • For positive matrices with lower density or poorly bounded entries, lower bounds of diag(v)\operatorname{diag}(v)4 or even diag(v)\operatorname{diag}(v)5 can be attained (He, 13 Jul 2025).
  • The contraction is monotonic for a suitable potential function—either a Lyapunov energy or the KL divergence of the current marginal (Modin, 2023, Chakrabarty et al., 2018).

A summary of bounds appears below:

Error Norm Iteration Bound (dense) Reference
diag(v)\operatorname{diag}(v)6 diag(v)\operatorname{diag}(v)7 (Chakrabarty et al., 2018)
diag(v)\operatorname{diag}(v)8 diag(v)\operatorname{diag}(v)9 (Chakrabarty et al., 2018)
General P=diag(u)Kdiag(v)P = \operatorname{diag}(u) K \operatorname{diag}(v)0 P=diag(u)Kdiag(v)P = \operatorname{diag}(u) K \operatorname{diag}(v)1 (He, 13 Jul 2025)
Sparse/low-density P=diag(u)Kdiag(v)P = \operatorname{diag}(u) K \operatorname{diag}(v)2 (He, 13 Jul 2025)

Over-relaxation techniques, including adaptive selection of the relaxation parameter, yield further iteration count reductions, particularly in the regime where the spectral gap of the associated operator is small (1711.01851).

5. Applications and Extensions

The Sinkhorn–Knopp centering process has broad applications:

  • Entropic Optimal Transport: Directly computes regularized transport plans with prescribed marginals, minimizing a combined cost and entropy objective (Cuturi, 2013, Brauer et al., 2017).
  • Preconditioning and Matrix Balancing: Used to transform matrices into forms suitable for fast solvers or numerical stability, e.g., producing a doubly stochastic matrix from a positive input (Chakrabarty et al., 2018, Nathanson, 2019).
  • Machine Learning and Data Analysis: Fundamental for efficient computation of Sinkhorn distances between histograms or feature vectors (Cuturi, 2013).
  • Image Processing and Denoising: Underpins normalization in Gaussian mixture model-based smoothing filters; alternates EM (E-step = column normalization, M-step = row normalization) for statistical interpretation (Chan et al., 2016).
  • Quantum Computing: The conjectured analog involves scaling unitary matrices via phase correction to achieve prescribed line sums; with implications for quantum gate decompositions (Vos et al., 2014).
  • Constrained Transport and Prior Zeros: Extensions handle sparse plans with enforced zeros by setting forbidden entries in P=diag(u)Kdiag(v)P = \operatorname{diag}(u) K \operatorname{diag}(v)3 to zero, preserving Bregman projection structure and convergence (Corless et al., 2024).
  • Acceleration and High-Performance Computing: Arnoldi-type and Newton accelerations exploit the underlying nonlinear eigenproblems for further speedup in challenging regimes (Aristodemo et al., 2018, Brauer et al., 2017).

6. Measure-Theoretic and Infinite-Dimensional Generalizations

Beurling’s theorem gives a measure-theoretic generalization of the Sinkhorn–Knopp process, establishing uniqueness of scaling solutions for Radon measures on continuous spaces P=diag(u)Kdiag(v)P = \operatorname{diag}(u) K \operatorname{diag}(v)4 via positive integral kernels (Modin, 2023). The scaling of measures (as opposed to finitely supported vectors) generalizes the algorithm to infinite dimensions with well-posedness and stability results.

Under appropriate kernel positivity, the map from arbitrary nonnegative measures to product measures is a homeomorphism, and the scaling process inherits uniqueness and continuity properties with respect to data perturbations.

7. Practical Considerations

Each Sinkhorn–Knopp iteration has computational cost P=diag(u)Kdiag(v)P = \operatorname{diag}(u) K \operatorname{diag}(v)5 for dense P=diag(u)Kdiag(v)P = \operatorname{diag}(u) K \operatorname{diag}(v)6 matrices, dominated by matrix-vector multiplication P=diag(u)Kdiag(v)P = \operatorname{diag}(u) K \operatorname{diag}(v)7 and P=diag(u)Kdiag(v)P = \operatorname{diag}(u) K \operatorname{diag}(v)8. For kernel matrices arising from structured cost (e.g., Gaussian or Toeplitz), FFT-based or fast transform techniques can reduce per-iteration complexity (Modin, 2023).

Stability is a concern when the regularization parameter P=diag(u)Kdiag(v)P = \operatorname{diag}(u) K \operatorname{diag}(v)9 is small, as the kernel P1=a,P1=bP \mathbb{1} = a, \qquad P^\top \mathbb{1} = b0 can underflow to numerically zero. Log-domain stabilizations (operating on P1=a,P1=bP \mathbb{1} = a, \qquad P^\top \mathbb{1} = b1, P1=a,P1=bP \mathbb{1} = a, \qquad P^\top \mathbb{1} = b2) and log-sum-exp formulas are standard practice (Brauer et al., 2017).

Stopping criteria are typically based on violation of the marginal constraints in P1=a,P1=bP \mathbb{1} = a, \qquad P^\top \mathbb{1} = b3 or relative terms, with tolerances of P1=a,P1=bP \mathbb{1} = a, \qquad P^\top \mathbb{1} = b4–P1=a,P1=bP \mathbb{1} = a, \qquad P^\top \mathbb{1} = b5 common in applications (Modin, 2023, Cuturi, 2013). When mass cannot be matched (e.g., in the presence of prior zeros), unbalanced variants add penalty terms with competitive convergence properties (Corless et al., 2024).

Overrelaxation and acceleration are crucial for ill-conditioned or nearly decomposable matrices, and recent advances establish adaptive step selection schemes with global Lyapunov guarantees (1711.01851). Arnoldi and Newton-type eigen-acceleration realizes dramatic gains for hard instances with clustered spectrum (Aristodemo et al., 2018, Brauer et al., 2017).

References

  • "On the geometry and dynamical formulation of the Sinkhorn algorithm for optimal transport" (Modin, 2023)
  • "A Sinkhorn-Newton method for entropic optimal transport" (Brauer et al., 2017)
  • "Better and Simpler Error Analysis of the Sinkhorn-Knopp Algorithm for Matrix Scaling" (Chakrabarty et al., 2018)
  • "Overrelaxed Sinkhorn-Knopp Algorithm for Regularized Optimal Transport" (1711.01851)
  • "Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Distances" (Cuturi, 2013)
  • "Understanding Symmetric Smoothing Filters: A Gaussian Mixture Model Perspective" (Chan et al., 2016)
  • "On the Efficiency of Sinkhorn-Knopp for Entropically Regularized Optimal Transport" (He, 4 Apr 2026)
  • "Accelerating the Sinkhorn-Knopp iteration by Arnoldi-type methods" (Aristodemo et al., 2018)
  • "Scaling a unitary matrix" (Vos et al., 2014)
  • "Phase transition of the Sinkhorn-Knopp algorithm" (He, 13 Jul 2025)
  • "Matrix scaling, explicit Sinkhorn limits, and arithmetic" (Nathanson, 2019)
  • "Algorithms for constrained optimal transport" (Corless et al., 2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sinkhorn–Knopp Centering Process.