Papers
Topics
Authors
Recent
Search
2000 character limit reached

Eckart–Young–Mirsky Theorem

Updated 21 April 2026
  • The Eckart–Young–Mirsky theorem is a fundamental result that guarantees the unique optimal low-rank matrix approximation using truncated singular value decomposition.
  • It extends to any unitarily invariant norm and generalizes to tensor and tubal tensor frameworks, ensuring robust performance in diverse settings.
  • Its practical applications include closed-form solutions in subspace clustering, model compression in deep neural networks, and stability analysis under perturbations.

The Eckart–Young–Mirsky theorem precisely characterizes the best low-rank approximation of a matrix under any unitarily invariant norm, establishing that truncation of the singular value decomposition (SVD) yields the unique optimal solution. This result underpins a wide range of applications in numerical linear algebra, data analysis, optimization, and deep learning. Modern research further extends its scope to tensors, tubal tensor frameworks, autoencoders, and the stability of low-rank approximations under perturbations.

1. Classical Statement and Generalization under Unitarily Invariant Norms

Let ARm×nA\in\mathbb{R}^{m\times n} with SVD A=Udiag(σ1,,σr,0,,0)VA=U\,\mathrm{diag}(\sigma_1,\dots,\sigma_r,0,\dots,0)\,V^*, where σ1σ2σr>0\sigma_1\ge\sigma_2\ge\cdots\ge\sigma_r>0 and r=rank(A)r=\mathrm{rank}(A). The truncated SVD of rank krk\le r is defined as Ak=Udiag(σ1,,σk,0,,0)VA_k = U\,\mathrm{diag}(\sigma_1,\dots,\sigma_k,0,\dots,0)\,V^*. The original theorem states:

  • In the Frobenius norm,

AAkF=minrank(B)kABF=(i=k+1rσi2)1/2.\|A - A_k\|_F = \min_{\mathrm{rank}(B)\le k} \|A-B\|_F = \left( \sum_{i=k+1}^r \sigma_i^2 \right)^{1/2}.

  • In the spectral (operator) norm,

AAk2=minrank(B)kAB2=σk+1.\|A - A_k\|_2 = \min_{\mathrm{rank}(B)\le k} \|A-B\|_2 = \sigma_{k+1}.

The minimizer is unique in either norm if and only if σk>σk+1\sigma_k > \sigma_{k+1} (Yu et al., 2012).

A generalization asserts that for any unitarily invariant norm UI\|\cdot\|_{\mathrm{UI}},

A=Udiag(σ1,,σr,0,,0)VA=U\,\mathrm{diag}(\sigma_1,\dots,\sigma_r,0,\dots,0)\,V^*0

meaning the SVD truncation is optimal across all such norms, including Schatten A=Udiag(σ1,,σr,0,,0)VA=U\,\mathrm{diag}(\sigma_1,\dots,\sigma_r,0,\dots,0)\,V^*1-norms and Ky Fan A=Udiag(σ1,,σr,0,,0)VA=U\,\mathrm{diag}(\sigma_1,\dots,\sigma_r,0,\dots,0)\,V^*2-norms (Yu et al., 2012). The proof leverages SVD block decomposition, singular value majorization, and monotonicity of unitarily invariant norms.

2. Extensions to Tensors

Efforts to generalize the theorem to tensor settings have revealed deeper geometric and algebraic structures. For order-A=Udiag(σ1,,σr,0,,0)VA=U\,\mathrm{diag}(\sigma_1,\dots,\sigma_r,0,\dots,0)\,V^*3 (partially symmetric) tensors, best rank-A=Udiag(σ1,,σr,0,,0)VA=U\,\mathrm{diag}(\sigma_1,\dots,\sigma_r,0,\dots,0)\,V^*4 approximations in the Frobenius norm lack a simple universal SVD truncation analogue. Nonetheless, for a "sufficiently general" tensor A=Udiag(σ1,,σr,0,,0)VA=U\,\mathrm{diag}(\sigma_1,\dots,\sigma_r,0,\dots,0)\,V^*5 in A=Udiag(σ1,,σr,0,,0)VA=U\,\mathrm{diag}(\sigma_1,\dots,\sigma_r,0,\dots,0)\,V^*6, all critical rank-at-most-A=Udiag(σ1,,σr,0,,0)VA=U\,\mathrm{diag}(\sigma_1,\dots,\sigma_r,0,\dots,0)\,V^*7 approximations are confined to a fixed critical subspace A=Udiag(σ1,,σr,0,,0)VA=U\,\mathrm{diag}(\sigma_1,\dots,\sigma_r,0,\dots,0)\,V^*8, which is the span of the complex critical rank-one tensors under suitable dimension and symmetry constraints (Draisma et al., 2017). This framework encapsulates geometric features such as secant varieties of Segre–Veronese embeddings and recovers the classical theorem for matrices.

For tubal tensors, as in the t-SVD/tubal algebra setting, an Eckart–Young–Mirsky-type result holds precisely for tubal products induced by block-orthogonal (unitary up to scaling) transforms. Here, optimally truncating the t-SVD yields the best low-rank approximation in the Frobenius norm. Necessary and sufficient conditions on the tubal algebra guarantee this result and unify the classical and tensorial statements (Mor, 30 Dec 2025).

3. Role in Optimization, Closed-Form Solutions, and Subspace Clustering

Because the truncated SVD provides the minimizer for general unitarily invariant norms, many rank-constrained or norm-regularized problems admit closed-form solutions:

  • For problems of the form A=Udiag(σ1,,σr,0,,0)VA=U\,\mathrm{diag}(\sigma_1,\dots,\sigma_r,0,\dots,0)\,V^*9, under suitable conditions on σ1σ2σr>0\sigma_1\ge\sigma_2\ge\cdots\ge\sigma_r>00 and σ1σ2σr>0\sigma_1\ge\sigma_2\ge\cdots\ge\sigma_r>01, the solution reduces to the SVD truncation of a projected matrix (Yu et al., 2012).
  • In subspace clustering, the shape-interaction matrix σ1σ2σr>0\sigma_1\ge\sigma_2\ge\cdots\ge\sigma_r>02 remains optimal under any unitarily invariant norm, and in noisy or regularized variants, the solution can be constructed via SVD thresholding (Yu et al., 2012).

The table below summarizes classic and generalized statements:

Setting Best Rank-σ1σ2σr>0\sigma_1\ge\sigma_2\ge\cdots\ge\sigma_r>03 Approximant Norm Optimality Uniqueness Condition
Matrices, SVD Truncation σ1σ2σr>0\sigma_1\ge\sigma_2\ge\cdots\ge\sigma_r>04 Any unitarily inv. σ1σ2σr>0\sigma_1\ge\sigma_2\ge\cdots\ge\sigma_r>05
Partially symmetric tensor Linear span in σ1σ2σr>0\sigma_1\ge\sigma_2\ge\cdots\ge\sigma_r>06 Frobenius (in general) Sufficiently general σ1σ2σr>0\sigma_1\ge\sigma_2\ge\cdots\ge\sigma_r>07
Tubal tensors (t-SVD) Truncated t-SVD (if algebra conditions met) Frobenius Algebraic; transform block-unitarity

4. Stability under Perturbations and Spectral Bounds

Perturbation analysis quantifies the stability of SVD truncation and resulting low-rank approximants under noise. The classical theorem provides the bound for any perturbation σ1σ2σr>0\sigma_1\ge\sigma_2\ge\cdots\ge\sigma_r>08:

σ1σ2σr>0\sigma_1\ge\sigma_2\ge\cdots\ge\sigma_r>09

Recent advances yield tighter, high-probability spectral-norm perturbation bounds when r=rank(A)r=\mathrm{rank}(A)0 is symmetric with eigengap r=rank(A)r=\mathrm{rank}(A)1 and r=rank(A)r=\mathrm{rank}(A)2:

r=rank(A)r=\mathrm{rank}(A)3

where r=rank(A)r=\mathrm{rank}(A)4 (Tran et al., 29 Oct 2025). If the noise r=rank(A)r=\mathrm{rank}(A)5 aligns weakly with the leading eigenspace, an additional gain is possible. These results are crucial for analyzing differentially private PCA and spectral algorithms in high-dimensional regimes (Tran et al., 29 Oct 2025).

5. Applications in Deep Learning and Autoencoders

The theorem underlies model compression in deep neural networks via low-rank decompositions. In such frameworks, each layer's weight tensor is unfolded into a matrix, which is then block-wise compressed by SVD truncation. The optimal error bound for each layer, used by allocation algorithms for global compression targets, is given by the Eckart–Young–Mirsky result (Liebenwein et al., 2021). Similarly, in symmetric autoencoders, the theorem yields explicit layer-wise reconstruction error bounds when the weights are orthonormal, and motivates the EYS (Eckart–Young–Schmidt) initialization: sequentially initializing layers by empirical SVD on latent representations (Brivio et al., 13 Jun 2025).

6. Historical and Contemporary Impact

The Eckart–Young–Mirsky theorem, with roots in the 1930s, has become foundational in matrix approximation, data analysis (PCA), numerical algorithms, and convex optimization. Its extension to tensors is the subject of active research. Modern work in tubal tensor algebra, high-dimensional spectral analysis, and deep learning architectures continues to explore its boundaries and leverage its guarantees for both theory and applications (Yu et al., 2012, Draisma et al., 2017, Mor, 30 Dec 2025, Tran et al., 29 Oct 2025, Brivio et al., 13 Jun 2025, Liebenwein et al., 2021).

7. Connections to Broader Theories and Open Directions

Connections exist to the geometry of secant varieties, module theory in algebra, operator theory in Hilbert spaces, and spectral functionals in matrix analysis. Recent developments address optimality in more general algebras, the ramifications for data privacy, and non-Euclidean approximation problems. Ongoing research investigates sharper perturbation bounds, closed-form solutions in structured settings, and the existence or uniqueness of optimal approximations for higher-order and structured tensors—highlighting both the power and the subtle limitations of the Eckart–Young–Mirsky paradigm across modern computational mathematics.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Eckart–Young–Mirsky Theorem.