Fast randomized numerical rank estimation for numerically low-rank matrices (2105.07388v2)
Abstract: Matrices with low-rank structure are ubiquitous in scientific computing. Choosing an appropriate rank is a key step in many computational algorithms that exploit low-rank structure. However, estimating the rank has been done largely in an ad-hoc fashion in large-scale settings. In this work we develop a randomized algorithm for estimating the numerical rank of a (numerically low-rank) matrix. The algorithm is based on sketching the matrix with random matrices from both left and right; the key fact is that with high probability, the sketches preserve the orders of magnitude of the leading singular values. We prove a result on the accuracy of the sketched singular values and show that gaps in the spectrum are detected. For an $m\times n$ $(m\geq n)$ matrix of numerical rank $r$, the algorithm runs with complexity $O(mn\log n+r3)$, or less for structured matrices. The steps in the algorithm are required as a part of many low-rank algorithms, so the additional work required to estimate the rank can be even smaller in practice. Numerical experiments illustrate the speed and robustness of our rank estimator.
- A. Andoni and H. L. Nguyễn. Eigenvalues of a matrix in the streaming model. Proc. Annu. ACM-SIAM Symp. Discrete Algorithms, pages 1729–1737, 2013.
- G. Aubrun and S. J. Szarek. Alice and Bob meet Banach, volume 223. American Mathematical Society, 2017.
- Blendenpik: Supercharging LAPACK’s least-squares solver. SIAM J. Sci. Comp., 32(3):1217–1236, 2010.
- Z. Bai and J. W. Silverstein. Spectral Analysis of Large Dimensional Random Matrices. Springer, 2010.
- Minimizing communication for eigenproblems and the singular value decomposition. Technical Report 237, LAPACK Working Note, 2010.
- B. Beckermann and A. Townsend. On the singular values of matrices with displacement structure. SIAM J. Matrix Anal. Appl., 38(4):1227–1248, 2017.
- C. Boutsidis and A. Gittens. Improved matrix algorithms via the subsampled randomized Hadamard transform. SIAM J. Matrix Anal. Appl., 34(3):1301–1340, 2013.
- E. J. Candès and B. Recht. Exact matrix completion via convex optimization. Found. Comput. Math., 9(6):717–772, 2009.
- Hashing embeddings of optimal dimension, with applications to linear least squares. arXiv:2105.11815, 2021.
- Fast matrix rank algorithms and applications. J. ACM, 60(5):1–25, 2013.
- Low-rank approximation and regression in input sparsity time. J. ACM, 63(6):54, 2017.
- M. B. Cohen. Nearly tight oblivious subspace embeddings by trace inequalities. In Proc. Annu. ACM-SIAM Symp. Discrete Algorithms, pages 278–287. SIAM, 2016.
- Local operator theory, random matrices and Banach spaces. In Handbook of the geometry of Banach spaces, pages 317–366. Elsevier, 2001.
- Efficient estimation of eigenvalue counts in an interval. Numer. Lin. Alg. Appl., 23(4):674–692, 2016.
- J. A. Duersch and M. Gu. Randomized projection for rank-revealing matrix factorizations and low-rank approximations. SIAM Rev., 62(3):661–682, 2020.
- N. El Karoui. Spectrum estimation for large dimensional covariance matrices using random matrix theory. Ann. Stat., 36(6):2757–2790, 2008.
- A. Gittens and J. A. Tropp. Tail bounds for all eigenvalues of a sum of random matrices. 1:1–23, 2011.
- Matrix Computations. The Johns Hopkins University Press, 4th edition, 2012.
- Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev., 53(2):217–288, 2011.
- I. C. Ipsen and T. Wentworth. The effect of coherence on sampling from matrices with orthonormal columns, and preconditioned least squares problems. SIAM J. Matrix Anal. Appl., 35(4):1490–1520, 2014.
- I. M. Johnstone. On the distribution of the largest eigenvalue in principal components analysis. Ann. Stat., pages 295–327, 2001.
- D. M. Kane and J. Nelson. Sparser Johnson-Lindenstrauss transforms. J. ACM, 61(1):1–23, 2014.
- V. Koltchinskii and K. Lounici. Concentration inequalities and moment bounds for sample covariance operators. Bernoulli, 23(1):110–133, 2017.
- O. Ledoit and M. Wolf. A well-conditioned estimator for large-dimensional covariance matrices. J. Multivar. Anal., 88(2):365–411, 2004.
- O. Ledoit and M. Wolf. Nonlinear shrinkage estimation of large-dimensional covariance matrices. Ann. Stat., 40(2):1024–1060, 2012.
- O. Ledoit and M. Wolf. Analytical nonlinear shrinkage of large-dimensional covariance matrices. Ann. Stat., 48(5):3043–3065, 2020.
- Approximating spectral densities of large matrices. SIAM Rev., 58(1):34–65, 2016.
- Distribution of eigenvalues for some sets of random matrices. Mathematics of the USSR-Sbornik, 1(4):457–483, 1967.
- P.-G. Martinsson. Randomized methods for matrix computations. arXiv preprint 1607.01649, 2016.
- P.-G. Martinsson. Fast Direct Solvers for Elliptic PDEs. SIAM, 2019.
- randUTV: A blocked randomized algorithm for computing a rank-revealing UTV factorization. ACM Trans. Math. Soft., 45(1):1–26, 2019.
- Randomized numerical linear algebra: Foundations and algorithms. Acta Numer., pages 403–572, 2020.
- R. Mathias. Two theorems on singular values and eigenvalues. Am. Math. Mon., 97(1):47–50, 1990.
- LSRN: A parallel iterative solver for strongly over-or underdetermined systems. SIAM J. Sci. Comp., 36(2):C95–C118, 2014.
- B. Nadler. Finite sample approximation results for principal component analysis: A matrix perturbation approach. Ann. Stat., 36(6):2791–2817, 2008.
- Y. Nakatsukasa. Fast and stable randomized low-rank matrix approximation. arXiv:2009.11392, 2020.
- Statistical Eigen-Inference from large Wishart matrices. Ann. Stat., 36(6):2850–2885, 2008.
- V. Rokhlin and M. Tygert. A fast randomized algorithm for overdetermined linear least-squares regression. Proc. Natl. Acad. Sci., 105(36):13212–13217, 2008.
- M. Rudelson and R. Vershynin. Non-asymptotic theory of random matrices: Extreme singular values. In Proc. ICM, pages 1576–1602, 2010.
- J. A. Tropp. Improved analysis of the subsampled randomized Hadamard transform. Advances in Adaptive Data Analysis, 3(1-2):115–126, 2011.
- Streaming low-rank matrix approximation with an application to scientific simulation. SIAM J. Sci. Comp., 41(4):A2430–A2463, 2019.
- S. Ubaru and Y. Saad. Fast methods for estimating the numerical rank of large matrices. In ICML, pages 468–477. PMLR, 2016.
- Fast estimation of approximate matrix ranks using spectral densities. Neural Comput., 29(5):1317–1351, may 2017.
- Find the dimension that counts: Fast dimension estimation and krylov pca. In Proc. SIAM Int. Conf. on Data Min., pages 720–728. SIAM, 2019.
- M. Udell and A. Townsend. Why are big data matrices approximately low rank? SIAM J. Math. Data Sci., 1(1):144–160, 2019.
- R. Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge University Press, 2018.
- M. J. Wainwright. High-dimensional statistics: A non-asymptotic viewpoint. Cambridge University Press, 2019.
- D. P. Woodruff. Sketching as a tool for numerical linear algebra. Found. Trends Theor. Comput. Sci., 10(1–2):1–157, 2014.
- Efficient randomized algorithms for the fixed-precision low-rank matrix approximation. SIAM J. Matrix Anal. Appl., 39(3):1339–1359, 2018.
- Distributed estimation of generalized matrix rank: Efficient algorithms and lower bounds. In ICML, pages 457–465. PMLR, 2015.