Sinkhorn-Knopp Projection
- Sinkhorn-Knopp Projection is an iterative matrix scaling procedure that normalizes nonnegative matrices to achieve targeted row and column sums, widely used in optimal transport and network analysis.
- It operates by alternating normalization steps viewed as KL-divergence projections, offering provable convergence rates and allowing acceleration via overrelaxation and Krylov techniques.
- The method extends to rectangular matrices and constrained supports, making it versatile for applications in entropically-regularized transport, quantum information, and large-scale optimization.
The Sinkhorn-Knopp Projection is an iterative matrix scaling procedure that projects an arbitrary nonnegative matrix onto the set of matrices with prescribed row and column sums or, equivalently, onto the transportation polytope. It is foundational in areas such as entropically-regularized optimal transport, matrix balancing, and constrained optimal transport, where it provides a closed-form, scalable, and provably convergent method for finding doubly stochastic or targeted-marginal matrices. The projection operates via alternating normalization of rows and columns and can be understood through the lens of KL-divergence minimization, convex optimization, and duality theory. The method's convergence and complexity are well-characterized, with exact rates depending on matrix structure, and the algorithm admits generalizations including rectangular maps, overrelaxation schemes, and acceleration by Krylov subspace techniques.
1. Mathematical Foundation and Problem Formulation
The Sinkhorn-Knopp Projection seeks, for a given nonnegative matrix and prescribed positive marginals with , a matrix of the form such that and .
In entropically-regularized optimal transport, the associated minimization is
where is the set of nonnegative matrices with marginals and is a cost matrix. The minimizer has the closed-form
and are positive vectors enforcing the marginal constraints (Cuturi, 2013).
The general approach is extensible to rectangular maps in the setting of positive operators, where one seeks scaling by invertible such that is doubly stochastic (Cariello, 2016). Support and total support conditions on or are both necessary and sufficient for solvability.
2. Algorithmic Structure and Fixed-Point Updates
The core Sinkhorn-Knopp iteration alternates between row and column normalization: where './' is componentwise division. Starting with an initial , these updates are repeated until convergence. The process can be expressed more generally as alternating KL-divergence projections onto affine sets of matrices with fixed row or column sums (Chakrabarty et al., 2018, Mazzilli et al., 2022): for row and column normalization, respectively.
For matrices with prescribed zero patterns or additional constraints, projection iterates over the support-restricted submatrix using masked or zeroed entries (Corless et al., 2024).
A simplified pseudocode for Sinkhorn-Knopp in the setting of entropic OT:
1 2 3 4 5 6 7 8 |
def sinkhorn_knopp(C, a, b, epsilon, tol): K = np.exp(-C / epsilon) u = np.ones_like(a) v = np.ones_like(b) while not converged: u = a / (K @ v) v = b / (K.T @ u) return np.diag(u) @ K @ np.diag(v) |
3. Convergence, Complexity, and Phase Transitions
The Sinkhorn-Knopp algorithm converges to the unique (up to scaling) pair such that achieves the prescribed marginals, provided has total support (or, more generally, is fully indecomposable) (Chan et al., 2016, Adams et al., 2011).
Convergence is geometric in Hilbert's projective metric. Explicit iteration bounds are given by (Chakrabarty et al., 2018):
- To reach KL-divergence below , steps suffice,
- For -error , iterations, where , , are element and sparsity parameters of .
A sharp phase transition in iteration complexity occurs at density threshold : for normalized matrices with , only iterations are required to reach -error , whereas for scaling needs iterations even for strictly positive matrices (He, 13 Jul 2025).
For overrelaxed variants, the iteration may be accelerated by incorporating an overrelaxation parameter , achieving faster local rates. In the limit, Arnoldi-type methods leveraging the nonlinear eigenvector formulation exhibit further reductions in iteration count, especially when the matrix has clustered singular values (Aristodemo et al., 2018, 1711.01851).
4. Convex Optimization, Duality, and KL-Projections
Sinkhorn-Knopp projection arises as the solution of strictly convex separable programs involving KL-divergence. For OT, this is the unique minimizer of
and the dual is
with primal solution (Cuturi, 2013).
The scaling steps compute alternating Bregman projections (specifically, KL) onto affine constraint sets. In the infinite-dimensional setting of operator scaling, these appear as generalized alternating projections with diagonal premultiplication and postmultiplication (Cariello, 2016).
The KL-divergence serves as a natural potential function, sharply dropping with each projection and serving as the basis for error estimates, Pinsker's inequality lower bounds, and rates of convergence (Chakrabarty et al., 2018).
5. Applications in Optimal Transport, Learning, and Network Science
Sinkhorn-Knopp projection is now ubiquitous in large-scale optimal transport as an efficient way to compute entropic Wasserstein distances, which in turn underpins fast distributional data analysis and generative modeling (Cuturi, 2013, Kushinsky et al., 2017). The algorithm is also central for differentiable ranking via doubly-stochastic normalization in neural sequencing and retrieval (Adams et al., 2011), for matrix balancing with scalable computation, and for the analysis of network nestedness (Mazzilli et al., 2022).
In quantum information, the rectangular version yields necessary and sufficient conditions for normal forms in entanglement detection and filter normalization (Cariello, 2016). Extension to image denoising, smoothing filter symmetrization, and clustering is achieved via the interpretation of SK steps as alternating expectation-maximization for Gaussian mixture models (Chan et al., 2016).
In large optimization relaxations, such as the Johnson–Adams LP for the quadratic assignment problem, KL-projection-based generalizations of Sinkhorn-Knopp provide scalable, numerically stable solvers that outperform classic methods on high-dimensional polytopes (Kushinsky et al., 2017).
6. Generalizations and Extensions
The Sinkhorn-Knopp procedure is extensible to:
- Rectangular matrices: generalized to non-square and operator-valued settings, enabling the balancing of positive maps with prescribed left/right "marginals" (Cariello, 2016).
- Masked or constrained supports: zero patterns and forbidden entries are incorporated directly by setting the prior and maintaining zeros throughout iteration (Corless et al., 2024).
- Overrelaxation and acceleration: Post-processing projections with an overrelaxation parameter in log-space enables SOR-type speedups with guaranteed Lyapunov decay for global convergence, and optimal local rates are achieved with heuristic eigenvalue estimates (1711.01851).
- Arnoldi-type acceleration: Recasting the iteration as a nonlinear eigenproblem enables Krylov subspace methods and block Arnoldi techniques, producing significant empirical speedup, particularly in sparse or nearly decomposable matrices (Aristodemo et al., 2018).
7. Practical Considerations, Implementation, and Empirical Insights
Each iteration of the canonical method involves two matrix-vector multiplications and two componentwise divisions, with computational complexity per step for dense matrices. In practice, typical "dense" inputs converge in very few steps, justifying the method's popularity in large-scale pipelines. For structured or sparse matrices, the recommended approach is to assess density to anticipate potential slowdowns (He, 13 Jul 2025).
Convergence can be monitored via or errors in marginals, or reduction in KL-divergence. For differentiable learning, forward and backward propagation through multiple iterations is routine, enabling end-to-end gradient-based updates (Adams et al., 2011, Lu et al., 11 May 2025).
Representative pseudocode, algorithmic details, and practical observations on numerics and stopping criteria are detailed in (Cuturi, 2013, Chan et al., 2016, Lu et al., 11 May 2025).
Empirical studies confirm that the Sinkhorn-Knopp Projection is robust and efficient for massive-scale problems in optimal transport, assignment, economic metrics, object detection, and quantum state normalization, with provable guarantees provided the relevant support or total-support conditions are satisfied.