Taylor Expansion for SVD

Updated 19 September 2025

Taylor expansion for SVD is a technique that approximates the local behavior of singular values and vectors under perturbations using analytic and geometric methods.
It leverages tangent spaces and Terracini’s lemma to perform robust error analysis and guide gradient computations in low-rank approximations.
The approach integrates Fréchet derivatives and spectral series to enhance numerical stability in applications like filtering and tensor decomposition.

The Taylor expansion for the singular value decomposition (SVD) is a central analytic technique for understanding the sensitivity, stability, and local geometry of SVD-based approximants under perturbations. Rooted in both algebraic geometry and matrix calculus, Taylor expansions for SVD relate the behavior of singular values and singular vectors to perturbations in the original matrix, enabling precise local error analysis, robust gradient computation, and principled algorithm design. This relationship leverages the structure of low-rank varieties, the geometry of tangent spaces (notably via Terracini’s lemma), and, in modern computational practice, links with recursive Fréchet derivative expansions and spectral series approximations.

1. Geometric Foundations of SVD and Taylor Expansions

The SVD of a real matrix $A$ \ $(A = U \Sigma V^\top)$ provides a global factorization, but from a geometric perspective, it also yields a natural local approximation. When considering best rank- $r$ approximations via truncated SVD (i.e., $B = U (\Sigma_1 + \cdots + \Sigma_r) V^\top$ ), one is effectively projecting $A$ onto an $r$ -secant variety of the rank-1 matrix variety. For perturbed families $A(t) = A + tE + O(t^2)$ , the best low-rank approximant $B$ varies according to the projection of the perturbation $E$ onto the tangent space of the low-rank variety at $B$ .

If $f(X) = \frac{1}{2}\|A - X\|^2$ is the distance function to the variety, the leading (linear) term of its Taylor expansion at a critical point $X$ is governed by the projection of $E$ onto the tangent space $T_X$ . In formula, the necessary condition for $X$ as a best approximation is:

$\langle A - X, Y \rangle = 0 \quad \text{for all } Y \in T_X$

This orthogonality condition—arising from the first-order Taylor expansion of the squared norm—is a direct consequence of the underlying geometric structure (Ottaviani et al., 2015).

2. Tangent Spaces, Secant Varieties, and Terracini’s Lemma

In algebraic geometry, the variety of rank-1 matrices (or decomposable tensors) is termed the Segre variety. Its secant varieties define matrices (or tensors) of rank at most $r$ . Terracini’s lemma provides a powerful tool to characterize tangent spaces to these secant varieties: for generic points $p_1, ..., p_k$ of the variety $X$ , the tangent space to the $k$ -secant variety at $z \in \mathrm{span}\{p_1, ..., p_k\}$ satisfies

$T_z \sigma_k(X) = T_{p_1} X + \cdots + T_{p_k} X$

In SVD terms, for $B = \sum_{i=1}^r \sigma_i u_i v_i^{\top}$ , the tangent space at $B$ is

$T_{B} = \sum_{i=1}^r (\mathbb{R}^m \otimes v_i^\top + u_i \otimes \mathbb{R}^n)$

The first-order behavior of the distance function’s Taylor expansion is therefore completely determined by projections onto these tangent directions. For the best rank-1 approximation $B = \sigma_1 u_1 v_1^\top$ , the tangent space is $T_B = \mathbb{R}^m \otimes v_1^\top + u_1 \otimes \mathbb{R}^n$ , setting up the first-order Taylor conditions

$\langle E, u_1 \otimes z^\top \rangle = 0,\quad \langle E, x \otimes v_1^\top \rangle = 0,\quad \forall x \in \mathbb{R}^m,\, z \in \mathbb{R}^n$

which control the local variation of singular values and vectors under infinitesimal perturbations (Ottaviani et al., 2015).

3. Fréchet Derivative–Based Taylor Expansions for Matrix Functions

Taylor expansions for matrix-valued functions, such as the principal square root $y(Q) = Q^{1/2}$ , can be expressed using Fréchet derivatives. For $A \in S_+$ and symmetric perturbation $H$ :

$y(A + H) = y(A) + \sum_{k=1}^n \frac{1}{k!} V^{(k)}y(A)\cdot H + R_{n+1}[A, H]$

where $V^{(k)}$ are the Fréchet derivatives of $y$ at $A$ applied to $H$ , and the remainder $R_{n+1}$ is given as an integral over the $(n+1)^\text{th}$ derivative:

$R_{n+1}[A, H] = \frac{1}{n!} \int_0^1 (1-\epsilon)^n V^{(n+1)}y(A+\epsilon H) \cdot H\, d\epsilon$

The first Fréchet derivative of $y(A)$ is:

$V_y(A) \cdot H = \int_0^\infty e^{-t y(A)} H e^{-t y(A)} dt$

and this solves a Sylvester equation. Higher order derivatives follow a recursive summation indexed by lower-order derivatives. This approach enables nonasymptotic, quantitative error control for Taylor approximations, independent of $\|H\|$ as long as $A + H \in S_+$ (Moral et al., 2017).

These techniques offer systematic blueprints for Taylor expansion methods in SVD contexts. Differentiation of matrix functions via Fréchet derivatives is closely analogous to SVD perturbation analysis, where derivatives of singular values and singular vectors with respect to $A$ are required. Integral remainder terms in Taylor expansions are central for robust error estimation in spectral functions.

4. Taylor Expansion for SVD Gradients: Numerical Stability and Algorithms

The differentiation of singular vectors by analytical means in the SVD involves terms of the form $1/(\lambda_i - \lambda_j)$ , which are prone to numerical instability as eigenvalues become close ( $\lambda_i \approx \lambda_j$ ). To address this, Taylor expansions approximate the “unstable term”:

$\frac{1}{\lambda_i - \lambda_j} = \frac{1}{\lambda_i}\left[\frac{1}{1 - \frac{\lambda_j}{\lambda_i}}\right]$

Expanding $1/(1-x)$ for $|x| < 1$ via geometric expansion yields:

$\frac{1}{\lambda_i - \lambda_j} \approx \frac{1}{\lambda_i} \sum_{k=0}^K \left(\frac{\lambda_j}{\lambda_i}\right)^k$

which regularizes the denominator and prevents gradient explosion. For the dominant eigenvector, this expansion is mathematically equivalent to performing $K+1$ steps of power iteration, but it avoids iterative deflation and associated roundoff error accumulation (Wang et al., 2021). In network training (e.g., decorrelated batch normalization and second-order pooling), SVD-Taylor gradients achieved higher accuracy and stability than both classical SVD gradients and PI-based gradients.

5. Extension to Tensors and Higher-Order Structures

The geometric and Taylor expansion techniques generalize naturally to tensors. In multiway array settings, one approximates a tensor using sums of rank-1 decomposable tensors (analogous to SVD), with critical vector tuples $(x_1, ..., x_d)$ characterized by multi-linear eigenvalue equations:

$A \cdot (x_1 \otimes \cdots \otimes \hat{x}_i \otimes \cdots \otimes x_d) = \lambda x_i,\quad i = 1, ..., d$

The tangent space structure (via Terracini’s lemma) of the best low-rank tensor approximation controls the first-order behavior of the Taylor expansion of the distance function $d(A, X)$ , with $X$ belonging to the secant variety of the Segre (rank-1 tensor) variety. The interaction between local geometry and approximation theory is richer due to tensor decomposition pathologies (non-uniqueness, ill-posedness). Taylor expansions in this context are essential for sensitivity analysis of tensor approximations (Ottaviani et al., 2015).

6. SVD-Based Taylor Expansions in Numerical Filtering

Numerical schemes such as cubature Kalman filters (CKFs) have adopted SVD-based Taylor expansion techniques for propagating uncertainty robustly. When discretizing stochastic differential equations via higher-order Itô–Taylor expansions and representing error covariances as $P_{k|k} = Q_p D_p Q_p^\top$ (SVD), the square-root factors generated are numerically stable even when conventional Cholesky methods fail due to loss of positive definiteness in ill-conditioned scenarios. Measurement update steps are performed using SVD-based factorization, enabling reliable state estimation even as roundoff errors accumulate (Kulikova et al., 18 Feb 2024).

7. Summary of Key Formulas and Insights

Selected formulas central to Taylor expansion for SVD and related functions:

Concept	Formula (LaTeX)	Context
Terracini’s lemma (rank- $r$ )	$T_{p_1+\cdots+p_r}\,\sigma_r(V) = T_{p_1}V + \cdots + T_{p_r}V$	Tangent space to secant variety
SVD Taylor expansion cond.	$\langle A - U \Sigma_r V^T,\, Y \rangle = 0,\quad \forall\, Y \in T_{U \Sigma_r V^T}$	First-order criticality
Fréchet derivative for root	$V_y(A) \cdot H = \int_0^\infty e^{-t y(A)} H e^{-t y(A)} dt$	Matrix function expansion
Taylor expansion of unstable	$\frac{1}{\lambda_i - \lambda_j} \approx \frac{1}{\lambda_i}\sum_{k=0}^K\left(\frac{\lambda_j}{\lambda_i}\right)^k$	SVD gradient regularization

The geometric view, recursive Fréchet derivative approach, and spectral Taylor expansion techniques jointly provide a mathematically structured, numerically robust foundation for SVD perturbation analysis, gradient computation (especially in differentiable deep learning layers), tensor expansions, and high-precision filtering applications. The integral remainder and recursive construction of higher derivatives enable nonasymptotic error bounds, which are critical in real-world, finite perturbation regimes. The Taylor expansion for SVD thus synthesizes geometric, analytic, and computational principles for advanced matrix and tensor approximation theory.