Differentiable Singular Value Decomposition

Published 15 Jan 2025 in math.NA, cs.NA, math.CV, math.FA, and math.OC | (2501.08522v2)

Abstract: Singular value decomposition is widely used in modal analysis, such as proper orthogonal decomposition and resolvent analysis, to extract key features from complex problems. SVD derivatives need to be computed efficiently to enable the large scale design optimization. However, for a general complex matrix, no method can accurately compute this derivative to machine precision and remain scalable with respect to the number of design variables without requiring the all of the singular variables. We propose two algorithms to efficiently compute this derivative based on the adjoint method and reverse automatic differentiation and RAD-based singular value derivative formula. Differentiation results for each method proposed were compared with FD results for one square and one tall rectangular matrix example and matched with the FD results to about 5 to 7 digits. Finally, we demonstrate the scalability of the proposed method by calculating the derivatives of singular values with respect to the snapshot matrix derived from the POD of a large dataset for a laminar-turbulent transitional flow over a flat plate, sourced from the John Hopkins turbulence database.

Abstract PDF Upgrade to Chat

Summary

The paper introduces adjoint-based and reverse-mode derivative algorithms that achieve machine-precision accuracy in computing singular values and vectors.
It leverages both the Gram Matrix and Symmetric Embedding methods to efficiently handle high-dimensional, complex-valued matrices.
Numerical experiments validate the scalability and memory efficiency, making the approach effective for gradient-based optimization and reduced-order modeling.

Differentiable Singular Value Decomposition: Adjoint and Reverse-Mode Derivative Algorithms

Introduction and Motivation

Singular Value Decomposition (SVD) is a foundational tool in numerical linear algebra, underpinning a wide range of applications in engineering, data analysis, and scientific computing. Its role in modal analysis, reduced-order modeling (e.g., Proper Orthogonal Decomposition, POD), and optimization is particularly prominent. However, the efficient and accurate computation of derivatives of SVD outputs (singular values and vectors) with respect to input matrices is a persistent challenge, especially for large-scale, high-dimensional, or complex-valued problems. Existing approaches, such as finite differences (FD), complex step (CS), and algorithmic differentiation (AD), suffer from limitations in scalability, accuracy, or memory requirements, particularly when the number of design variables is large or when only a subset of singular variables is needed.

This work addresses these challenges by introducing two adjoint-based algorithms and a reverse-mode automatic differentiation (RAD) formula for efficiently computing SVD derivatives. The proposed methods are applicable to general complex matrices (square or rectangular), achieve machine-precision accuracy, and are designed to avoid the computational and storage overheads associated with requiring all singular variables.

SVD Derivative Computation: Problem Formulation

The SVD of a general complex matrix $A \in \mathbb{C}^{m \times n}$ is $A = U \Sigma V^*$ , where $U$ and $V$ are unitary, and $\Sigma$ is diagonal (or rectangular diagonal). The derivatives of interest are those of singular values $\sigma_i$ and singular vectors $u_i, v_i$ with respect to the entries of $A$ , as required in gradient-based optimization and sensitivity analysis.

The paper leverages two key relationships between SVD and eigenvalue problems (EVPs):

Gram Matrix Method (GMM): SVD is related to the eigendecomposition of the Gram matrices $B = AA^*$ and $C = A^*A$ . The left (right) singular vectors and singular values are obtained from the eigenvectors and square roots of the eigenvalues of $B$ ( $C$ ).
Symmetric Embedding Matrix Method (SEMM): SVD is related to the eigenproblem of the symmetric embedding

$S = \begin{bmatrix} 0 & A \ A^* & 0 \end{bmatrix}$

whose eigenvalues are $\pm \sigma_i$ and eigenvectors encode both $u_i$ and $v_i$ .

Both approaches yield governing equations for the SVD variables, which are then differentiated using adjoint-based techniques.

Adjoint-Based Derivative Algorithms

Gram Matrix Method (GMM)

The GMM approach formulates the SVD derivative problem as a sequence of EVP derivatives. The adjoint method is applied to the residual form of the EVP, enforcing normalization and phase constraints to ensure uniqueness of the eigenvectors. The total derivative of a function $f(u, v, \sigma, A)$ with respect to $A$ is computed via the chain rule, with the adjoint vector $\psi$ obtained by solving a linear system involving the Jacobian of the residuals.

Two variants are provided:

Left Gram Matrix Method (LGMM): Focuses on derivatives with respect to left singular vectors and values.
Right Gram Matrix Method (RGMM): Focuses on derivatives with respect to right singular vectors and values.

The adjoint equations are constructed such that the computational cost does not scale with the number of design variables, making the approach suitable for large-scale optimization.

Symmetric Embedding Matrix Method (SEMM)

The SEMM approach directly differentiates the SVD governing equations derived from the symmetric embedding. The adjoint system is larger (dimension $m+n$ ), but avoids explicit formation of Gram matrices, which is advantageous for dense matrices. The method simultaneously yields derivatives of both left and right singular vectors and the singular value.

The adjoint system is solved for the adjoint vector, and the total derivative is assembled using the chain rule, with explicit expressions for the contributions from the state variables and direct dependencies on $A$ .

Implementation Considerations

The adjoint systems are linear and can be solved efficiently using standard solvers (e.g., numpy.linalg.solve).
Automatic differentiation tools (e.g., JAX) can be used to compute Jacobians and direct derivatives, facilitating implementation for arbitrary objective functions.
The methods are applicable to both real and complex matrices, provided the singular values are distinct (degeneracy is not addressed).

Reverse-Mode Automatic Differentiation (RAD) Formula

For the specific case where only the derivative of a singular value with respect to $A$ is required, the paper introduces a compact RAD formula:

For complex $A$ :

$\frac{\partial \sigma}{\partial A_r} = u_r v_r^T + u_i v_i^T, \quad \frac{\partial \sigma}{\partial A_i} = -u_r v_i^T + u_i v_r^T$

For real $A$ :

$\frac{\partial \sigma}{\partial A} = u v^T$

This formula is memory-efficient, does not require all singular variables, and is suitable for large-scale problems.

Numerical Results and Validation

The proposed methods are validated on both square and tall rectangular complex matrices, with derivatives compared against finite difference results. The adjoint-based and RAD-based derivatives match FD results to 5–7 digits, demonstrating both accuracy and robustness. The methods are further applied to a large-scale POD problem using data from the John Hopkins Turbulence Database (JHTDB), involving a snapshot matrix with $1.5 \times 10^8$ rows and 75 columns. The RAD formula is used to compute derivatives of the leading singular values, demonstrating scalability and practical applicability to high-dimensional scientific datasets.

Practical and Theoretical Implications

The adjoint-based and RAD-based SVD derivative algorithms presented in this work offer several advantages:

Scalability: The computational cost does not scale with the number of design variables, making the methods suitable for high-dimensional optimization and sensitivity analysis.
Memory Efficiency: The methods avoid the need to store all singular variables, reducing memory requirements.
Generality: The algorithms are applicable to both real and complex matrices, and to both square and rectangular cases.
Accuracy: Machine-precision accuracy is achieved, as demonstrated by strong agreement with FD results.
Implementation Flexibility: The use of automatic differentiation tools for Jacobian and direct derivative computation enables straightforward integration into existing scientific computing workflows.

These properties make the methods particularly attractive for applications in gradient-based design optimization, differentiable reduced-order modeling (e.g., differentiable POD), and sensitivity analysis in large-scale engineering and scientific problems.

Future Directions

Potential avenues for further research include:

Extension to cases with repeated or nearly repeated singular values, where the singular subspace is degenerate and the derivative problem is ill-posed.
Integration with modern machine learning frameworks for end-to-end differentiable pipelines involving SVD (e.g., in differentiable physics or scientific machine learning).
Application to differentiable resolvent analysis and other modal decomposition techniques in fluid dynamics and structural mechanics.
Exploration of higher-order derivatives and their efficient computation in the context of SVD.

Conclusion

This work provides a comprehensive framework for differentiable SVD, introducing adjoint-based and RAD-based algorithms that are accurate, scalable, and memory-efficient. The methods are validated on both small and large-scale problems, and are broadly applicable to engineering design, scientific computing, and data-driven modeling. The results enable efficient gradient computation for SVD-based analyses, facilitating the integration of SVD into large-scale, gradient-based optimization and learning systems.

Markdown