- The paper introduces adjoint-based and reverse-mode derivative algorithms that achieve machine-precision accuracy in computing singular values and vectors.
- It leverages both the Gram Matrix and Symmetric Embedding methods to efficiently handle high-dimensional, complex-valued matrices.
- Numerical experiments validate the scalability and memory efficiency, making the approach effective for gradient-based optimization and reduced-order modeling.
Differentiable Singular Value Decomposition: Adjoint and Reverse-Mode Derivative Algorithms
Introduction and Motivation
Singular Value Decomposition (SVD) is a foundational tool in numerical linear algebra, underpinning a wide range of applications in engineering, data analysis, and scientific computing. Its role in modal analysis, reduced-order modeling (e.g., Proper Orthogonal Decomposition, POD), and optimization is particularly prominent. However, the efficient and accurate computation of derivatives of SVD outputs (singular values and vectors) with respect to input matrices is a persistent challenge, especially for large-scale, high-dimensional, or complex-valued problems. Existing approaches, such as finite differences (FD), complex step (CS), and algorithmic differentiation (AD), suffer from limitations in scalability, accuracy, or memory requirements, particularly when the number of design variables is large or when only a subset of singular variables is needed.
This work addresses these challenges by introducing two adjoint-based algorithms and a reverse-mode automatic differentiation (RAD) formula for efficiently computing SVD derivatives. The proposed methods are applicable to general complex matrices (square or rectangular), achieve machine-precision accuracy, and are designed to avoid the computational and storage overheads associated with requiring all singular variables.
The SVD of a general complex matrix A∈Cm×n is A=UΣV∗, where U and V are unitary, and Σ is diagonal (or rectangular diagonal). The derivatives of interest are those of singular values σi​ and singular vectors ui​,vi​ with respect to the entries of A, as required in gradient-based optimization and sensitivity analysis.
The paper leverages two key relationships between SVD and eigenvalue problems (EVPs):
- Gram Matrix Method (GMM): SVD is related to the eigendecomposition of the Gram matrices B=AA∗ and C=A∗A. The left (right) singular vectors and singular values are obtained from the eigenvectors and square roots of the eigenvalues of B (C).
- Symmetric Embedding Matrix Method (SEMM): SVD is related to the eigenproblem of the symmetric embedding
S=[0​A A∗​0​]
whose eigenvalues are ±σi​ and eigenvectors encode both ui​ and vi​.
Both approaches yield governing equations for the SVD variables, which are then differentiated using adjoint-based techniques.
Adjoint-Based Derivative Algorithms
Gram Matrix Method (GMM)
The GMM approach formulates the SVD derivative problem as a sequence of EVP derivatives. The adjoint method is applied to the residual form of the EVP, enforcing normalization and phase constraints to ensure uniqueness of the eigenvectors. The total derivative of a function f(u,v,σ,A) with respect to A is computed via the chain rule, with the adjoint vector ψ obtained by solving a linear system involving the Jacobian of the residuals.
Two variants are provided:
- Left Gram Matrix Method (LGMM): Focuses on derivatives with respect to left singular vectors and values.
- Right Gram Matrix Method (RGMM): Focuses on derivatives with respect to right singular vectors and values.
The adjoint equations are constructed such that the computational cost does not scale with the number of design variables, making the approach suitable for large-scale optimization.
Symmetric Embedding Matrix Method (SEMM)
The SEMM approach directly differentiates the SVD governing equations derived from the symmetric embedding. The adjoint system is larger (dimension m+n), but avoids explicit formation of Gram matrices, which is advantageous for dense matrices. The method simultaneously yields derivatives of both left and right singular vectors and the singular value.
The adjoint system is solved for the adjoint vector, and the total derivative is assembled using the chain rule, with explicit expressions for the contributions from the state variables and direct dependencies on A.
Implementation Considerations
- The adjoint systems are linear and can be solved efficiently using standard solvers (e.g.,
numpy.linalg.solve).
- Automatic differentiation tools (e.g., JAX) can be used to compute Jacobians and direct derivatives, facilitating implementation for arbitrary objective functions.
- The methods are applicable to both real and complex matrices, provided the singular values are distinct (degeneracy is not addressed).
For the specific case where only the derivative of a singular value with respect to A is required, the paper introduces a compact RAD formula:
∂Ar​∂σ​=ur​vrT​+ui​viT​,∂Ai​∂σ​=−ur​viT​+ui​vrT​
∂A∂σ​=uvT
This formula is memory-efficient, does not require all singular variables, and is suitable for large-scale problems.
Numerical Results and Validation
The proposed methods are validated on both square and tall rectangular complex matrices, with derivatives compared against finite difference results. The adjoint-based and RAD-based derivatives match FD results to 5–7 digits, demonstrating both accuracy and robustness. The methods are further applied to a large-scale POD problem using data from the John Hopkins Turbulence Database (JHTDB), involving a snapshot matrix with 1.5×108 rows and 75 columns. The RAD formula is used to compute derivatives of the leading singular values, demonstrating scalability and practical applicability to high-dimensional scientific datasets.
Practical and Theoretical Implications
The adjoint-based and RAD-based SVD derivative algorithms presented in this work offer several advantages:
- Scalability: The computational cost does not scale with the number of design variables, making the methods suitable for high-dimensional optimization and sensitivity analysis.
- Memory Efficiency: The methods avoid the need to store all singular variables, reducing memory requirements.
- Generality: The algorithms are applicable to both real and complex matrices, and to both square and rectangular cases.
- Accuracy: Machine-precision accuracy is achieved, as demonstrated by strong agreement with FD results.
- Implementation Flexibility: The use of automatic differentiation tools for Jacobian and direct derivative computation enables straightforward integration into existing scientific computing workflows.
These properties make the methods particularly attractive for applications in gradient-based design optimization, differentiable reduced-order modeling (e.g., differentiable POD), and sensitivity analysis in large-scale engineering and scientific problems.
Future Directions
Potential avenues for further research include:
- Extension to cases with repeated or nearly repeated singular values, where the singular subspace is degenerate and the derivative problem is ill-posed.
- Integration with modern machine learning frameworks for end-to-end differentiable pipelines involving SVD (e.g., in differentiable physics or scientific machine learning).
- Application to differentiable resolvent analysis and other modal decomposition techniques in fluid dynamics and structural mechanics.
- Exploration of higher-order derivatives and their efficient computation in the context of SVD.
Conclusion
This work provides a comprehensive framework for differentiable SVD, introducing adjoint-based and RAD-based algorithms that are accurate, scalable, and memory-efficient. The methods are validated on both small and large-scale problems, and are broadly applicable to engineering design, scientific computing, and data-driven modeling. The results enable efficient gradient computation for SVD-based analyses, facilitating the integration of SVD into large-scale, gradient-based optimization and learning systems.