Eckart–Young–Mirsky Theorem

Updated 21 April 2026

The Eckart–Young–Mirsky theorem is a fundamental result that guarantees the unique optimal low-rank matrix approximation using truncated singular value decomposition.
It extends to any unitarily invariant norm and generalizes to tensor and tubal tensor frameworks, ensuring robust performance in diverse settings.
Its practical applications include closed-form solutions in subspace clustering, model compression in deep neural networks, and stability analysis under perturbations.

The Eckart–Young–Mirsky theorem precisely characterizes the best low-rank approximation of a matrix under any unitarily invariant norm, establishing that truncation of the singular value decomposition (SVD) yields the unique optimal solution. This result underpins a wide range of applications in numerical linear algebra, data analysis, optimization, and deep learning. Modern research further extends its scope to tensors, tubal tensor frameworks, autoencoders, and the stability of low-rank approximations under perturbations.

1. Classical Statement and Generalization under Unitarily Invariant Norms

Let $A\in\mathbb{R}^{m\times n}$ with SVD $A=U\,\mathrm{diag}(\sigma_1,\dots,\sigma_r,0,\dots,0)\,V^*$ , where $\sigma_1\ge\sigma_2\ge\cdots\ge\sigma_r>0$ and $r=\mathrm{rank}(A)$ . The truncated SVD of rank $k\le r$ is defined as $A_k = U\,\mathrm{diag}(\sigma_1,\dots,\sigma_k,0,\dots,0)\,V^*$ . The original theorem states:

In the Frobenius norm,

$\|A - A_k\|_F = \min_{\mathrm{rank}(B)\le k} \|A-B\|_F = \left( \sum_{i=k+1}^r \sigma_i^2 \right)^{1/2}.$

In the spectral (operator) norm,

$\|A - A_k\|_2 = \min_{\mathrm{rank}(B)\le k} \|A-B\|_2 = \sigma_{k+1}.$

The minimizer is unique in either norm if and only if $\sigma_k > \sigma_{k+1}$ (Yu et al., 2012).

A generalization asserts that for any unitarily invariant norm $\|\cdot\|_{\mathrm{UI}}$ ,

$A=U\,\mathrm{diag}(\sigma_1,\dots,\sigma_r,0,\dots,0)\,V^*$ 0

meaning the SVD truncation is optimal across all such norms, including Schatten $A=U\,\mathrm{diag}(\sigma_1,\dots,\sigma_r,0,\dots,0)\,V^*$ 1-norms and Ky Fan $A=U\,\mathrm{diag}(\sigma_1,\dots,\sigma_r,0,\dots,0)\,V^*$ 2-norms (Yu et al., 2012). The proof leverages SVD block decomposition, singular value majorization, and monotonicity of unitarily invariant norms.

2. Extensions to Tensors

Efforts to generalize the theorem to tensor settings have revealed deeper geometric and algebraic structures. For order- $A=U\,\mathrm{diag}(\sigma_1,\dots,\sigma_r,0,\dots,0)\,V^*$ 3 (partially symmetric) tensors, best rank- $A=U\,\mathrm{diag}(\sigma_1,\dots,\sigma_r,0,\dots,0)\,V^*$ 4 approximations in the Frobenius norm lack a simple universal SVD truncation analogue. Nonetheless, for a "sufficiently general" tensor $A=U\,\mathrm{diag}(\sigma_1,\dots,\sigma_r,0,\dots,0)\,V^*$ 5 in $A=U\,\mathrm{diag}(\sigma_1,\dots,\sigma_r,0,\dots,0)\,V^*$ 6, all critical rank-at-most- $A=U\,\mathrm{diag}(\sigma_1,\dots,\sigma_r,0,\dots,0)\,V^*$ 7 approximations are confined to a fixed critical subspace $A=U\,\mathrm{diag}(\sigma_1,\dots,\sigma_r,0,\dots,0)\,V^*$ 8, which is the span of the complex critical rank-one tensors under suitable dimension and symmetry constraints (Draisma et al., 2017). This framework encapsulates geometric features such as secant varieties of Segre–Veronese embeddings and recovers the classical theorem for matrices.

For tubal tensors, as in the t-SVD/tubal algebra setting, an Eckart–Young–Mirsky-type result holds precisely for tubal products induced by block-orthogonal (unitary up to scaling) transforms. Here, optimally truncating the t-SVD yields the best low-rank approximation in the Frobenius norm. Necessary and sufficient conditions on the tubal algebra guarantee this result and unify the classical and tensorial statements (Mor, 30 Dec 2025).

3. Role in Optimization, Closed-Form Solutions, and Subspace Clustering

Because the truncated SVD provides the minimizer for general unitarily invariant norms, many rank-constrained or norm-regularized problems admit closed-form solutions:

For problems of the form $A=U\,\mathrm{diag}(\sigma_1,\dots,\sigma_r,0,\dots,0)\,V^*$ 9, under suitable conditions on $\sigma_1\ge\sigma_2\ge\cdots\ge\sigma_r>0$ 0 and $\sigma_1\ge\sigma_2\ge\cdots\ge\sigma_r>0$ 1, the solution reduces to the SVD truncation of a projected matrix (Yu et al., 2012).
In subspace clustering, the shape-interaction matrix $\sigma_1\ge\sigma_2\ge\cdots\ge\sigma_r>0$ 2 remains optimal under any unitarily invariant norm, and in noisy or regularized variants, the solution can be constructed via SVD thresholding (Yu et al., 2012).

The table below summarizes classic and generalized statements:

Setting	Best Rank- $\sigma_1\ge\sigma_2\ge\cdots\ge\sigma_r>0$ 3 Approximant	Norm Optimality	Uniqueness Condition
Matrices, SVD	Truncation $\sigma_1\ge\sigma_2\ge\cdots\ge\sigma_r>0$ 4	Any unitarily inv.	$\sigma_1\ge\sigma_2\ge\cdots\ge\sigma_r>0$ 5
Partially symmetric tensor	Linear span in $\sigma_1\ge\sigma_2\ge\cdots\ge\sigma_r>0$ 6	Frobenius (in general)	Sufficiently general $\sigma_1\ge\sigma_2\ge\cdots\ge\sigma_r>0$ 7
Tubal tensors (t-SVD)	Truncated t-SVD (if algebra conditions met)	Frobenius	Algebraic; transform block-unitarity

4. Stability under Perturbations and Spectral Bounds

Perturbation analysis quantifies the stability of SVD truncation and resulting low-rank approximants under noise. The classical theorem provides the bound for any perturbation $\sigma_1\ge\sigma_2\ge\cdots\ge\sigma_r>0$ 8:

$\sigma_1\ge\sigma_2\ge\cdots\ge\sigma_r>0$ 9

Recent advances yield tighter, high-probability spectral-norm perturbation bounds when $r=\mathrm{rank}(A)$ 0 is symmetric with eigengap $r=\mathrm{rank}(A)$ 1 and $r=\mathrm{rank}(A)$ 2:

$r=\mathrm{rank}(A)$ 3

where $r=\mathrm{rank}(A)$ 4 (Tran et al., 29 Oct 2025). If the noise $r=\mathrm{rank}(A)$ 5 aligns weakly with the leading eigenspace, an additional gain is possible. These results are crucial for analyzing differentially private PCA and spectral algorithms in high-dimensional regimes (Tran et al., 29 Oct 2025).

5. Applications in Deep Learning and Autoencoders

The theorem underlies model compression in deep neural networks via low-rank decompositions. In such frameworks, each layer's weight tensor is unfolded into a matrix, which is then block-wise compressed by SVD truncation. The optimal error bound for each layer, used by allocation algorithms for global compression targets, is given by the Eckart–Young–Mirsky result (Liebenwein et al., 2021). Similarly, in symmetric autoencoders, the theorem yields explicit layer-wise reconstruction error bounds when the weights are orthonormal, and motivates the EYS (Eckart–Young–Schmidt) initialization: sequentially initializing layers by empirical SVD on latent representations (Brivio et al., 13 Jun 2025).

6. Historical and Contemporary Impact

The Eckart–Young–Mirsky theorem, with roots in the 1930s, has become foundational in matrix approximation, data analysis (PCA), numerical algorithms, and convex optimization. Its extension to tensors is the subject of active research. Modern work in tubal tensor algebra, high-dimensional spectral analysis, and deep learning architectures continues to explore its boundaries and leverage its guarantees for both theory and applications (Yu et al., 2012, Draisma et al., 2017, Mor, 30 Dec 2025, Tran et al., 29 Oct 2025, Brivio et al., 13 Jun 2025, Liebenwein et al., 2021).

7. Connections to Broader Theories and Open Directions

Connections exist to the geometry of secant varieties, module theory in algebra, operator theory in Hilbert spaces, and spectral functionals in matrix analysis. Recent developments address optimality in more general algebras, the ramifications for data privacy, and non-Euclidean approximation problems. Ongoing research investigates sharper perturbation bounds, closed-form solutions in structured settings, and the existence or uniqueness of optimal approximations for higher-order and structured tensors—highlighting both the power and the subtle limitations of the Eckart–Young–Mirsky paradigm across modern computational mathematics.