Faster Linear Systems and Matrix Norm Approximation via Multi-level Sketched Preconditioning (2405.05865v1)
Abstract: We present a new class of preconditioned iterative methods for solving linear systems of the form $Ax = b$. Our methods are based on constructing a low-rank Nystr\"om approximation to $A$ using sparse random sketching. This approximation is used to construct a preconditioner, which itself is inverted quickly using additional levels of random sketching and preconditioning. We prove that the convergence of our methods depends on a natural average condition number of $A$, which improves as the rank of the Nystr\"om approximation increases. Concretely, this allows us to obtain faster runtimes for a number of fundamental linear algebraic problems: 1. We show how to solve any $n\times n$ linear system that is well-conditioned except for $k$ outlying large singular values in $\tilde{O}(n{2.065} + k\omega)$ time, improving on a recent result of [Derezi\'nski, Yang, STOC 2024] for all $k \gtrsim n{0.78}$. 2. We give the first $\tilde{O}(n2 + {d_\lambda}{\omega}$) time algorithm for solving a regularized linear system $(A + \lambda I)x = b$, where $A$ is positive semidefinite with effective dimension $d_\lambda$. This problem arises in applications like Gaussian process regression. 3. We give faster algorithms for approximating Schatten $p$-norms and other matrix norms. For example, for the Schatten 1 (nuclear) norm, we give an algorithm that runs in $\tilde{O}(n{2.11})$ time, improving on an $\tilde{O}(n{2.18})$ method of [Musco et al., ITCS 2018]. Interestingly, previous state-of-the-art algorithms for most of the problems above relied on stochastic iterative methods, like stochastic coordinate and gradient descent. Our work takes a completely different approach, instead leveraging tools from matrix sketching.
- The fast johnson–lindenstrauss transform and approximate nearest neighbors. SIAM Journal on computing, 39(1):302–322, 2009.
- Near-optimal approximation of matrix functions by the lanczos method. arXiv preprint arXiv:2303.03358, 2023.
- Faster kernel ridge regression using sketching and preconditioning. SIAM Journal on Matrix Analysis and Applications, 38(4):1116–1138, 2017.
- On the rate of convergence of the preconditioned conjugate gradient method. Numerische Mathematik, 48:499–524, 1986.
- Fast randomized kernel ridge regression with statistical guarantees. In Advances in Neural Information Processing Systems, volume 28, 2015.
- Convex optimization. Cambridge university press, 2004.
- Near-optimal algorithms for linear algebra in the current matrix multiplication time. In Proceedings of the 2022 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 3043–3068. SIAM, 2022.
- Query complexity of least absolute deviation regression via robust uniform convergence. In Conference on Learning Theory, pages 1144–1179. PMLR, 2021.
- Optimal embedding dimension for sparse subspace embeddings. In 56th Annual ACM Symposium on Theory of Computing, 2024.
- Dimensionality reduction for k-means clustering and low rank approximation. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 163–172, 2015.
- Solving directed laplacian systems in nearly-linear time through sparse lu factorizations. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 898–909. IEEE, 2018.
- Solving linear programs in the current matrix multiplication time. Journal of the ACM (JACM), 68(1):1–39, 2021.
- Input sparsity time low-rank approximation via ridge leverage score sampling. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1758–1777. SIAM, 2017.
- Optimal Approximate Matrix Product in Terms of Stable Rank. In 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016), volume 55, pages 11:1–11:14, 2016.
- Michael B Cohen. Nearly tight oblivious subspace embeddings by trace inequalities. In Proceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete algorithms, pages 278–287. SIAM, 2016.
- Lp row sampling by lewis weights. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 183–192, 2015.
- Matrix multiplication via arithmetic progressions. In Proceedings of the nineteenth annual ACM symposium on Theory of computing, pages 1–6, 1987.
- Low rank approximation and regression in input sparsity time. In Proceedings of the forty-fifth annual ACM symposium on Theory of Computing, pages 81–90, 2013.
- Robust, randomized preconditioning for kernel ridge regression. arXiv preprint arXiv:2304.12465, 2023.
- High-dimensional asymptotics of prediction: Ridge regression and classification. The Annals of Statistics, 46(1):247–279, 2018.
- Solving dense linear systems faster than via preconditioning. In 56th Annual ACM Symposium on Theory of Computing, 2024.
- Fast randomized kernel methods with statistical guarantees. stat, 1050:2, 2014.
- Ethan N. Epperly. Fast and forward stable randomized algorithms for linear least-squares problems. arXiv preprint arXiv:2311.04362, 2024.
- Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, pages 2540–2548, 2015.
- Principal component projection without principal component analysis. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, pages 2349–2357, 2016.
- Randomized Nyström preconditioning. SIAM Journal on Matrix Analysis and Applications, 44(2):718–752, 2023.
- Revisiting the Nyström method for improved large-scale machine learning. J. Mach. Learn. Res., 17(1):3977–4041, 2016.
- Solving ridge regression using sketched preconditioned SVRG. In Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 1397–1405, 2016.
- Anne Greenbaum. Behavior of slightly perturbed Lanczos and conjugate-gradient recurrences. Linear Algebra and its Applications, 113:7 – 63, 1989.
- Inexact preconditioned conjugate gradient method with inner-outer iteration. SIAM Journal on Scientific Computing, 21(4):1305–1320, 1999.
- Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2):217–288, 2011.
- Methods of conjugate gradients for solving linear systems. Journal of research of the National Bureau of Standards, 49(6):409–436, 1952.
- Michael F. Hutchinson. A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. Communications in Statistics-Simulation and Computation, 19(2):433–450, 1990.
- Online learning guided curvature approximation: A quasi-newton method with global non-asymptotic superlinear convergence. arXiv preprint arXiv:2302.08580, 2023.
- A faster interior point method for semidefinite programming. In 2020 IEEE 61st annual symposium on foundations of computer science (FOCS), pages 910–918. IEEE, 2020.
- Principal component projection and regression in nearly linear time through asymmetric svrg. In Advances in Neural Information Processing Systems, volume 32, 2019.
- Accelerating stochastic gradient descent using predictive variance reduction. In Advances in Neural Information Processing Systems, volume 26, 2013.
- Displacement ranks of matrices and linear equations. Journal of Mathematical Analysis and Applications, 68(2):395–407, 1979.
- A fast solver for a class of linear systems. Communications of the ACM, 55(10):99–107, 2012.
- Big-step-little-step: Efficient gradient methods for objectives with multiple scales. In Proceedings of Thirty Fifth Conference on Learning Theory, volume 178, pages 2431–2540, 2022.
- Sparser johnson-lindenstrauss transforms. Journal of the ACM (JACM), 61(1):1–23, 2014.
- Approximate gaussian elimination for laplacians - fast, sparse, and simple. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 573–582. IEEE Computer Society, 2016.
- Kenneth Levenberg. A method for the solution of certain non-linear problems in least squares. Quarterly of applied mathematics, 2(2):164–168, 1944.
- Francois Le Gall. Faster algorithms for rectangular matrix multiplication. In IEEE 53rd Annual Symposium on Foundations of Computer Science, pages 514–523, 2012.
- Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing, 461:370–403, 2021.
- Randomized methods for linear constraints: convergence rates and conditioning. Mathematics of Operations Research, 35(3):641–654, 2010.
- Efficient accelerated coordinate descent methods and faster algorithms for solving linear systems. In 2013 ieee 54th annual symposium on foundations of computer science, pages 147–156. IEEE, 2013.
- High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity. Advances in neural information processing systems, 24, 2011.
- Donald W Marquardt. An algorithm for least-squares estimation of nonlinear parameters. Journal of the society for Industrial and Applied Mathematics, 11(2):431–441, 1963.
- Diving into the shallows: a computational perspective on large-scale shallow learning. In Advances in Neural Information Processing Systems, volume 30, 2017.
- Kernel methods through the roof: Handling billions of points efficiently. In Advances in Neural Information Processing Systems, volume 33, pages 14410–14422, 2020.
- Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pages 91–100, 2013.
- Randomized block krylov methods for stronger and faster approximate singular value decomposition. Advances in neural information processing systems, 28, 2015.
- Recursive sampling for the nystrom method. Advances in neural information processing systems, 30, 2017.
- Hutch++: Optimal stochastic trace estimation. In Symposium on Simplicity in Algorithms (SOSA), pages 142–155, 2021.
- Stability of the Lanczos method for matrix function approximation. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1605–1624, 2018.
- Spectrum approximation beyond fast matrix multiplication: Algorithms and hardness. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018). Schloss-Dagstuhl-Leibniz Zentrum für Informatik, 2018.
- Iteration-complexity of a newton proximal extragradient method for monotone variational inequalities and inclusion problems. SIAM Journal on Optimization, 22(3):914–935, 2012.
- Sublinear time low-rank approximation of positive semidefinite matrices. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 672–683, 2017.
- Osnap: Faster numerical linear algebra algorithms via sparser subspace embeddings. In 2013 ieee 54th annual symposium on foundations of computer science, pages 117–126. IEEE, 2013.
- E. J. Nyström. Über die praktische auflösung von integralgleichungen mit anwendungen auf randwertaufgaben. Acta Math., 54:185–204, 1930.
- Christopher C. Paige. The computation of eigenvalues and eigenvectors of very large sparse matrices. PhD thesis, University of London, 1971.
- Christopher C. Paige. Error analysis of the Lanczos algorithm for tridiagonalizing a symmetric matrix. IMA Journal of Applied Mathematics, 18(3):341–349, 1976.
- Victor Pan. How to multiply matrices faster. Springer-Verlag, 1984.
- Solving sparse linear systems faster than matrix multiplication. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 504–521, 2021.
- Falkon: An optimal large scale kernel method. Advances in neural information processing systems, 30, 2017.
- A stochastic gradient method with an exponential convergence rate for finite training sets. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 2, pages 2663–2671, 2012.
- A fast randomized algorithm for overdetermined linear least-squares regression. Proceedings of the National Academy of Sciences, 105(36):13212–13217, 2008.
- Tamas Sarlos. Improved approximation algorithms for large matrices via random projections. In 2006 47th annual IEEE symposium on foundations of computer science (FOCS’06), pages 143–152. IEEE, 2006.
- Jonathan R Shewchuk. An introduction to the conjugate gradient method without the agonizing pain. Technical report, Carnegie Mellon University, USA, 1994.
- Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization. In Proceedings of the 31st International Conference on Machine Learning, volume 32, pages 64–72, 2014.
- Nearly linear time algorithms for preconditioning and solving symmetric, diagonally dominant linear systems. SIAM Journal on Matrix Analysis and Applications, 35(3):835–885, 2014.
- Volker Strassen. Gaussian elimination is not optimal. Numerische mathematik, 13(4):354–356, 1969.
- A randomized kaczmarz algorithm with exponential convergence. Journal of Fourier Analysis and Applications, 15(2):262–278, 2009.
- A note on preconditioning by low-stretch spanning trees. arXiv preprint arXiv:0903.2816, 2009.
- Joel A Tropp. Improved analysis of the subsampled randomized hadamard transform. Advances in Adaptive Data Analysis, 3(01n02):115–126, 2011.
- Virginia Vassilevska Williams. Multiplying matrices faster than coppersmith-winograd. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing, pages 887–898, 2012.
- Christopher K. I. Williams and Matthias Seeger. Using the Nyström method to speed up kernel machines. In Advances in Neural Information Processing Systems 13, pages 682–688. 2001.
- Subcubic equivalences between path, matrix and triangle problems. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 645–654, 2010.
- New bounds for matrix multiplication: from alpha to omega. arXiv preprint arXiv:2307.07970, 2023.
- A superfast structured solver for toeplitz linear systems via randomized sampling. SIAM Journal on Matrix Analysis and Applications, 33(3):837–858, 2012.
- Divide and conquer kernel ridge regression. In Conference on learning theory, pages 592–617. PMLR, 2013.