Matrix Scaling Problem

Updated 9 April 2026

Matrix scaling is the problem of adjusting a nonnegative matrix using diagonal matrices so that it achieves prescribed row and column sums, often forming a doubly stochastic matrix.
The Sinkhorn-Knopp algorithm, through alternating row and column normalization, offers a simple iterative method with geometric convergence under appropriate support and spectral conditions.
Advanced methods, including convex optimization and quantum algorithms, provide significant speedups and rigorous complexity bounds, enhancing applications in combinatorics, network flows, and preconditioning.

The matrix scaling problem concerns determining whether a given nonnegative matrix can be transformed, via multiplication by diagonal scaling matrices, into a matrix with prescribed row and column sums—most classically into a doubly stochastic matrix. This problem is foundational in linear algebra, mathematical optimization, theoretical computer science, quantum information, combinatorics, and numerical analysis. It has catalyzed a range of algorithmic and theoretical developments involving fixed-point iterations, convex optimization, spectral methods, second-order and quantum algorithms, as well as connections to network flows, permanent estimation, and operator theory.

1. Mathematical Formulation and Feasibility Conditions

Given a matrix $A\in\mathbb{R}_{\ge 0}^{n\times n}$ and nonnegative target marginals $r,c\in\mathbb{R}^n_{\geq 0}$ with $\|r\|_1 = \|c\|_1$ , the matrix scaling problem seeks positive diagonal matrices $X = \operatorname{diag}(e^{x})$ , $Y = \operatorname{diag}(e^{y})$ so that the scaled matrix

$B = X A Y, \quad B_{ij} = A_{ij} e^{x_i + y_j}$

satisfies

$B \mathbf{1} = r, \qquad B^\top \mathbf{1} = c.$

The classical doubly stochastic case corresponds to $r_i = c_j = 1/n$ for all $i, j$ .

A fundamental characterization (the Sinkhorn-Knopp theorem) asserts that $A$ is scalable to prescribed marginals if and only if every subset $r,c\in\mathbb{R}^n_{\geq 0}$ 0 of rows and subset $r,c\in\mathbb{R}^n_{\geq 0}$ 1 of columns with $r,c\in\mathbb{R}^n_{\geq 0}$ 2 satisfies the pattern-inequality condition

$r,c\in\mathbb{R}^n_{\geq 0}$ 3

with equality only if $r,c\in\mathbb{R}^n_{\geq 0}$ 4 (Idel, 2016). In the doubly stochastic setting, a necessary and sufficient condition is that $r,c\in\mathbb{R}^n_{\geq 0}$ 5 have total support.

2. Classical Algorithms: Sinkhorn's Method and Optimization Approaches

The canonical algorithm for matrix scaling is the Sinkhorn-Knopp (alternating-projections, RAS, IPFP) procedure:

Alternate normalizing rows and columns by scaling each so that their sums meet the marginals,
Each iteration for the doubly stochastic case:
1. $r,c\in\mathbb{R}^n_{\geq 0}$ 6
2. $r,c\in\mathbb{R}^n_{\geq 0}$ 7

Convergence is guaranteed and geometric (i.e., linear in the log-error) if the support conditions are met, yielding a unique scaling up to an overall scalar (Idel, 2016, Nathanson, 2019, Apeldoorn et al., 2020, Chakrabarty et al., 2018). Explicit iteration bounds for achieving an $r,c\in\mathbb{R}^n_{\geq 0}$ 8 error of $r,c\in\mathbb{R}^n_{\geq 0}$ 9 are $\|r\|_1 = \|c\|_1$ 0, with dependence on entrywise ratios and dimension (Chakrabarty et al., 2018, Apeldoorn et al., 2020). Stronger analysis using KL divergence and Pinsker-type inequalities relates the decrease in error per iteration directly to statistical distances (Chakrabarty et al., 2018).

An alternative optimization perspective expresses the scaling as convex minimization: $\|r\|_1 = \|c\|_1$ 1 whose gradient conditions are equivalent to the marginal constraints (Cohen et al., 2017, Allen-Zhu et al., 2017). This enables the design of accelerated first-order and second-order (box-constrained Newton, interior point) algorithms with improved asymptotic complexities—nearly linear in input size for well-conditioned instances and with a polylogarithmic dependence on the ratio of largest to smallest optimal scaling entries (Cohen et al., 2017, Allen-Zhu et al., 2017).

3. Spectral and Algorithmic Complexity Analysis

Quantitative analysis demonstrates that the classical Sinkhorn algorithm has strongly polynomial convergence in the presence of a spectral gap and for entrywise positive or well-conditioned matrices (Kwok et al., 2019). For matrices with spectral gap $\|r\|_1 = \|c\|_1$ 2, a continuous-time gradient flow yields exponential convergence: $\|r\|_1 = \|c\|_1$ 3 and correspondingly, discrete-time gradient steps converge in $\|r\|_1 = \|c\|_1$ 4 iterations. The spectral gap also yields bounds on the condition number of the scaling matrices and matrix capacity, relevant for permanent estimations and numerical preconditioning.

Modern approaches leverage these properties to obtain much faster algorithms: box-constrained Newton and interior-point methods run in time $\|r\|_1 = \|c\|_1$ 5 or $\|r\|_1 = \|c\|_1$ 6, where $\|r\|_1 = \|c\|_1$ 7 is the entrywise condition number of the optimal scaling (Cohen et al., 2017). First- and second-order methods exploiting Laplacian system solvers achieve total complexity $\|r\|_1 = \|c\|_1$ 8 under favorable conditions (Allen-Zhu et al., 2017).

4. Quantum Algorithms and Lower Bounds

Quantum algorithms offer provable polynomial speedups in certain regimes. Quantum amplitude estimation replaces classical row/column sum computation, and quantum graph sparsification accelerates Laplacian solves:

For general $\|r\|_1 = \|c\|_1$ 9, quantum Sinkhorn achieves time $X = \operatorname{diag}(e^{x})$ 0,
For entrywise-positive $X = \operatorname{diag}(e^{x})$ 1, refined quantum analysis improves runtime to $X = \operatorname{diag}(e^{x})$ 2 (Apeldoorn et al., 2020, Gribling et al., 2021).

Quantum lower bounds establish fundamental limits: in the high-precision ( $X = \operatorname{diag}(e^{x})$ 3) regime, no quantum algorithm can achieve sublinear scaling in the number of nonzeros, matching classical barriers (Gribling et al., 2021). For moderate precision, the best quantum algorithms match or modestly improve classical exponents in $X = \operatorname{diag}(e^{x})$ 4, with increasing cost as precision tightens; for instance, $X = \operatorname{diag}(e^{x})$ 5 lower bound for constant-error scaling of dense matrices and $X = \operatorname{diag}(e^{x})$ 6 for row-sum approximation in the $X = \operatorname{diag}(e^{x})$ 7 norm (Gribling et al., 2021).

Regime / Matrix Type	Classical Complexity	Quantum Complexity
General A, $X = \operatorname{diag}(e^{x})$ 8-error $X = \operatorname{diag}(e^{x})$ 9	$Y = \operatorname{diag}(e^{y})$ 0	$Y = \operatorname{diag}(e^{y})$ 1
Entrywise-positive	$Y = \operatorname{diag}(e^{y})$ 2	$Y = \operatorname{diag}(e^{y})$ 3

Quantum advances provide practical speedups in moderate-precision regimes, especially for large-scale network problems (optimal transport, matrix balancing, permanent approximation), but are subject to inescapable information-theoretic lower bounds at high precision (Apeldoorn et al., 2020, Gribling et al., 2021).

5. Combinatorial and Structural Aspects

The feasibility and combinatorics of matrix scaling are closely tied to perfect matchings in bipartite graphs and network flow properties. For a matrix $Y = \operatorname{diag}(e^{y})$ 4, the bipartite support graph $Y = \operatorname{diag}(e^{y})$ 5 encodes the positions of non-zeros, and the existence of a perfect matching is equivalent to scalability to the doubly stochastic form:

Hall's theorem: scalability $Y = \operatorname{diag}(e^{y})$ 6 every $Y = \operatorname{diag}(e^{y})$ 7 rows satisfies $Y = \operatorname{diag}(e^{y})$ 8, where $Y = \operatorname{diag}(e^{y})$ 9 is the neighborhood of $B = X A Y, \quad B_{ij} = A_{ij} e^{x_i + y_j}$ 0 (Hayashi et al., 2022).

The Sinkhorn algorithm’s convergence behavior reflects this combinatorial structure: in the absence of a perfect matching, the capacity potential becomes unbounded, and the sequence diverges. Algorithmic interpretations via geometric programming and KL-divergence minimization provide additional insight into the rate and structure of convergence and connection to network flow duality, polymatroid principal partitions, and the Dulmage-Mendelsohn decomposition in graph theory (Hayashi et al., 2022). The process also identifies “Hall blockers”—maximal subsets violating the Hall condition—in polynomial time.

6. Generalizations, Special Cases and Applications

Matrix scaling generalizes naturally to:

Prescribed but nonuniform marginals,
Different matrix norms (e.g., $B = X A Y, \quad B_{ij} = A_{ij} e^{x_i + y_j}$ 1-norm, leading to projective decomposition),
Nonsymmetric, signed, and rectangular matrices,
Non-commutative generalizations: scaling of positive linear maps (quantum channels), where operator scaling seeks completely positive maps with prescribed marginal properties (Idel, 2016).

Applications span:

Numerical preconditioning for solving linear systems (equilibration, balancing),
Permanent approximation for counting perfect matchings (Apeldoorn et al., 2020, Allen-Zhu et al., 2017),
Optimal transport, contingency table estimation, and entropy maximization,
Quantum state/channel normalization in information theory (Idel, 2016).

Numerical and large-scale settings have motivated matrix-free variants (approximate equilibration using matrix-vector products and stochastic estimation) that reduce memory or computational requirements further (Bradley et al., 2011).

7. Special Algebraic Phenomena and Explicit Formulas

For specific parametric families (e.g., symmetric $B = X A Y, \quad B_{ij} = A_{ij} e^{x_i + y_j}$ 2 matrices with two entry values), closed-form expressions for the Sinkhorn limit can be derived, often exhibiting explicit algebraic relations. In certain cases, the scaling process terminates in finitely many steps (finite termination phenomena), or intermediate matrices form sequences of rational approximations converging to algebraic numbers, linking to classical problems in diophantine approximation (Nathanson, 2019, Nathanson, 2019).

References

"Quantum algorithms for matrix scaling and matrix balancing" (Apeldoorn et al., 2020)
"Matrix Scaling and Balancing via Box Constrained Newton's Method and Interior Point Methods" (Cohen et al., 2017)
"Improved quantum lower and upper bounds for matrix scaling" (Gribling et al., 2021)
"Better and Simpler Error Analysis of the Sinkhorn-Knopp Algorithm for Matrix Scaling" (Chakrabarty et al., 2018)
"Finding Hall blockers by matrix scaling" (Hayashi et al., 2022)
"Matrix-Free Approximate Equilibration" (Bradley et al., 2011)
"A review of matrix scaling and Sinkhorn's normal form for matrices and positive maps" (Idel, 2016)
"Projective Decomposition and Matrix Equivalence up to Scale" (Robinson, 2019)
"Spectral analysis of matrix scaling and operator scaling" (Kwok et al., 2019)
"Matrix scaling limits in finitely many iterations" (Nathanson, 2019)
"Much Faster Algorithms for Matrix Scaling" (Allen-Zhu et al., 2017)
"Matrix scaling, explicit Sinkhorn limits, and arithmetic" (Nathanson, 2019)