Fast Differentiable Matrix Square Root (2201.08663v1)

Published 21 Jan 2022 in cs.CV, cs.LG, cs.MS, cs.NA, and math.NA

Abstract: Computing the matrix square root or its inverse in a differentiable manner is important in a variety of computer vision tasks. Previous methods either adopt the Singular Value Decomposition (SVD) to explicitly factorize the matrix or use the Newton-Schulz iteration (NS iteration) to derive the approximate solution. However, both methods are not computationally efficient enough in either the forward pass or in the backward pass. In this paper, we propose two more efficient variants to compute the differentiable matrix square root. For the forward propagation, one method is to use Matrix Taylor Polynomial (MTP), and the other method is to use Matrix Pad\'e Approximants (MPA). The backward gradient is computed by iteratively solving the continuous-time Lyapunov equation using the matrix sign function. Both methods yield considerable speed-up compared with the SVD or the Newton-Schulz iteration. Experimental results on the de-correlated batch normalization and second-order vision transformer demonstrate that our methods can also achieve competitive and even slightly better performances. The code is available at \href{https://github.com/KingJamesSong/FastDifferentiableMatSqrt}{https://github.com/KingJamesSong/FastDifferentiableMatSqrt}.

Authors (3)

Yue Song (56 papers)
Nicu Sebe (270 papers)
Wei Wang (1793 papers)

Citations (12)

View on Semantic Scholar

Summary

The paper introduces Matrix Padé Approximants (MPA-Lya) as a faster alternative to Newton-Schulz iterations for computing matrix square roots.
It demonstrates that MPA-Lya is approximately 1.5 times faster in CNN global covariance pooling applications while maintaining accuracy.
The study ensures the stability of Padé approximants, enabling efficient back-propagation without reliance on intermediate variable computations.

Fast Differentiable Matrix Square Root for Real Hermitian Matrices

This paper presents a novel approach to computing fast differentiable matrix square roots, particularly focusing on real Hermitian matrices. The authors introduce the Matrix Padé Approximants (MPA-Lya) as a more efficient alternative to conventional Newton-Schulz (NS) iterations, especially for large matrix dimensions prevalent in deep learning applications.

Key Contributions

Matrix Padé Approximants (MPA-Lya): The MPA-Lya method is proposed as a faster computation technique compared to NS iterations for obtaining matrix square roots. The results demonstrate MPA-Lya's superiority, especially for matrices larger than $1\times256\times256$ , maintaining computational efficiency even as matrix dimensions scale to $1\times1024\times1024$ . This efficiency persists across various applications, indicating MPA-Lya’s robustness for deep learning contexts.
Global Covariance Pooling in CNNs: The paper assesses MPA-Lya's impact on global covariance pooling (GCP) in CNNs, showing that it outperforms NS iteration in terms of speed while maintaining comparable accuracy. Specifically, MPA-Lya is approximately 1.5 times faster, making it advantageous for tasks requiring real-time computation without sacrificing performance.
Compatibility and Performance Analysis: The work highlights the limitations of NS-based back-propagation methods' incompatibility with MPA/MTP solvers due to differing dependence on intermediate variables. Despite this, the Lyapunov solver used in conjunction with MPA/MTP demonstrates that fast and accurate back-propagation is feasible without the need for intermediate values from NS iterations.
Stability of Padé Approximants: Stability issues associated with spurious poles in Padé approximants are addressed. The authors provide theoretical guarantees that, in their context, the approximants remain stable as there are no poles, avoiding potential defects that can arise with other approximations.

Practical and Theoretical Implications

The implications of deploying MPA-Lya in neural network architectures are twofold: practical efficiency and theoretical stability. The method ensures that matrix computations necessary for complex models like CNNs can be executed rapidly, reducing overhead computation costs associated with large-scale models. Theoretically, this research enriches understanding of approximant-based solutions and their application in high-performance environments. The insights on stability further solidify MPA-Lya's applicability across diverse matrices without encountering common computational pitfalls.

Future Directions

The findings open several avenues for future exploration. Investigating the integration of MPA-Lya with more diverse neural network architectures, such as transformers, could yield insights into its broader applicability. Additionally, further exploration into the theoretical aspects of Padé approximants, particularly in other matrix forms or non-Hermitian contexts, may extend its utility. Another promising direction would be to enhance the MPA-Lya to parallelize computation further, leveraging modern parallel computing hardware beyond conventional GPUs, which could redefine efficiency benchmarks in matrix operations.

In conclusion, this work provides a compelling alternative to state-of-the-art methods for matrix computations in neural networks, offering both speed and reliability. The MPA-Lya method's efficiency and broad applicability suggest significant potential for improving computational performance across numerous AI systems.

PDF Markdown

Related Papers

GitHub

GitHub - KingJamesSong/FastDifferentiableMatSqrt: ICLR22 "Fast Differentiable Matrix Square Root" and T-PAMI extension (52 stars)

Tweets

https://twitter.com/HannesStaerk/status/1778516941109612984