Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Faster Linear Systems and Matrix Norm Approximation via Multi-level Sketched Preconditioning (2405.05865v1)

Published 9 May 2024 in cs.DS, cs.LG, cs.NA, math.NA, and math.OC

Abstract: We present a new class of preconditioned iterative methods for solving linear systems of the form $Ax = b$. Our methods are based on constructing a low-rank Nystr\"om approximation to $A$ using sparse random sketching. This approximation is used to construct a preconditioner, which itself is inverted quickly using additional levels of random sketching and preconditioning. We prove that the convergence of our methods depends on a natural average condition number of $A$, which improves as the rank of the Nystr\"om approximation increases. Concretely, this allows us to obtain faster runtimes for a number of fundamental linear algebraic problems: 1. We show how to solve any $n\times n$ linear system that is well-conditioned except for $k$ outlying large singular values in $\tilde{O}(n{2.065} + k\omega)$ time, improving on a recent result of [Derezi\'nski, Yang, STOC 2024] for all $k \gtrsim n{0.78}$. 2. We give the first $\tilde{O}(n2 + {d_\lambda}{\omega}$) time algorithm for solving a regularized linear system $(A + \lambda I)x = b$, where $A$ is positive semidefinite with effective dimension $d_\lambda$. This problem arises in applications like Gaussian process regression. 3. We give faster algorithms for approximating Schatten $p$-norms and other matrix norms. For example, for the Schatten 1 (nuclear) norm, we give an algorithm that runs in $\tilde{O}(n{2.11})$ time, improving on an $\tilde{O}(n{2.18})$ method of [Musco et al., ITCS 2018]. Interestingly, previous state-of-the-art algorithms for most of the problems above relied on stochastic iterative methods, like stochastic coordinate and gradient descent. Our work takes a completely different approach, instead leveraging tools from matrix sketching.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (82)
  1. The fast johnson–lindenstrauss transform and approximate nearest neighbors. SIAM Journal on computing, 39(1):302–322, 2009.
  2. Near-optimal approximation of matrix functions by the lanczos method. arXiv preprint arXiv:2303.03358, 2023.
  3. Faster kernel ridge regression using sketching and preconditioning. SIAM Journal on Matrix Analysis and Applications, 38(4):1116–1138, 2017.
  4. On the rate of convergence of the preconditioned conjugate gradient method. Numerische Mathematik, 48:499–524, 1986.
  5. Fast randomized kernel ridge regression with statistical guarantees. In Advances in Neural Information Processing Systems, volume 28, 2015.
  6. Convex optimization. Cambridge university press, 2004.
  7. Near-optimal algorithms for linear algebra in the current matrix multiplication time. In Proceedings of the 2022 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 3043–3068. SIAM, 2022.
  8. Query complexity of least absolute deviation regression via robust uniform convergence. In Conference on Learning Theory, pages 1144–1179. PMLR, 2021.
  9. Optimal embedding dimension for sparse subspace embeddings. In 56th Annual ACM Symposium on Theory of Computing, 2024.
  10. Dimensionality reduction for k-means clustering and low rank approximation. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 163–172, 2015.
  11. Solving directed laplacian systems in nearly-linear time through sparse lu factorizations. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 898–909. IEEE, 2018.
  12. Solving linear programs in the current matrix multiplication time. Journal of the ACM (JACM), 68(1):1–39, 2021.
  13. Input sparsity time low-rank approximation via ridge leverage score sampling. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1758–1777. SIAM, 2017.
  14. Optimal Approximate Matrix Product in Terms of Stable Rank. In 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016), volume 55, pages 11:1–11:14, 2016.
  15. Michael B Cohen. Nearly tight oblivious subspace embeddings by trace inequalities. In Proceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete algorithms, pages 278–287. SIAM, 2016.
  16. Lp row sampling by lewis weights. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 183–192, 2015.
  17. Matrix multiplication via arithmetic progressions. In Proceedings of the nineteenth annual ACM symposium on Theory of computing, pages 1–6, 1987.
  18. Low rank approximation and regression in input sparsity time. In Proceedings of the forty-fifth annual ACM symposium on Theory of Computing, pages 81–90, 2013.
  19. Robust, randomized preconditioning for kernel ridge regression. arXiv preprint arXiv:2304.12465, 2023.
  20. High-dimensional asymptotics of prediction: Ridge regression and classification. The Annals of Statistics, 46(1):247–279, 2018.
  21. Solving dense linear systems faster than via preconditioning. In 56th Annual ACM Symposium on Theory of Computing, 2024.
  22. Fast randomized kernel methods with statistical guarantees. stat, 1050:2, 2014.
  23. Ethan N. Epperly. Fast and forward stable randomized algorithms for linear least-squares problems. arXiv preprint arXiv:2311.04362, 2024.
  24. Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, pages 2540–2548, 2015.
  25. Principal component projection without principal component analysis. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, pages 2349–2357, 2016.
  26. Randomized Nyström preconditioning. SIAM Journal on Matrix Analysis and Applications, 44(2):718–752, 2023.
  27. Revisiting the Nyström method for improved large-scale machine learning. J. Mach. Learn. Res., 17(1):3977–4041, 2016.
  28. Solving ridge regression using sketched preconditioned SVRG. In Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 1397–1405, 2016.
  29. Anne Greenbaum. Behavior of slightly perturbed Lanczos and conjugate-gradient recurrences. Linear Algebra and its Applications, 113:7 – 63, 1989.
  30. Inexact preconditioned conjugate gradient method with inner-outer iteration. SIAM Journal on Scientific Computing, 21(4):1305–1320, 1999.
  31. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2):217–288, 2011.
  32. Methods of conjugate gradients for solving linear systems. Journal of research of the National Bureau of Standards, 49(6):409–436, 1952.
  33. Michael F. Hutchinson. A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. Communications in Statistics-Simulation and Computation, 19(2):433–450, 1990.
  34. Online learning guided curvature approximation: A quasi-newton method with global non-asymptotic superlinear convergence. arXiv preprint arXiv:2302.08580, 2023.
  35. A faster interior point method for semidefinite programming. In 2020 IEEE 61st annual symposium on foundations of computer science (FOCS), pages 910–918. IEEE, 2020.
  36. Principal component projection and regression in nearly linear time through asymmetric svrg. In Advances in Neural Information Processing Systems, volume 32, 2019.
  37. Accelerating stochastic gradient descent using predictive variance reduction. In Advances in Neural Information Processing Systems, volume 26, 2013.
  38. Displacement ranks of matrices and linear equations. Journal of Mathematical Analysis and Applications, 68(2):395–407, 1979.
  39. A fast solver for a class of linear systems. Communications of the ACM, 55(10):99–107, 2012.
  40. Big-step-little-step: Efficient gradient methods for objectives with multiple scales. In Proceedings of Thirty Fifth Conference on Learning Theory, volume 178, pages 2431–2540, 2022.
  41. Sparser johnson-lindenstrauss transforms. Journal of the ACM (JACM), 61(1):1–23, 2014.
  42. Approximate gaussian elimination for laplacians - fast, sparse, and simple. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 573–582. IEEE Computer Society, 2016.
  43. Kenneth Levenberg. A method for the solution of certain non-linear problems in least squares. Quarterly of applied mathematics, 2(2):164–168, 1944.
  44. Francois Le Gall. Faster algorithms for rectangular matrix multiplication. In IEEE 53rd Annual Symposium on Foundations of Computer Science, pages 514–523, 2012.
  45. Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing, 461:370–403, 2021.
  46. Randomized methods for linear constraints: convergence rates and conditioning. Mathematics of Operations Research, 35(3):641–654, 2010.
  47. Efficient accelerated coordinate descent methods and faster algorithms for solving linear systems. In 2013 ieee 54th annual symposium on foundations of computer science, pages 147–156. IEEE, 2013.
  48. High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity. Advances in neural information processing systems, 24, 2011.
  49. Donald W Marquardt. An algorithm for least-squares estimation of nonlinear parameters. Journal of the society for Industrial and Applied Mathematics, 11(2):431–441, 1963.
  50. Diving into the shallows: a computational perspective on large-scale shallow learning. In Advances in Neural Information Processing Systems, volume 30, 2017.
  51. Kernel methods through the roof: Handling billions of points efficiently. In Advances in Neural Information Processing Systems, volume 33, pages 14410–14422, 2020.
  52. Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pages 91–100, 2013.
  53. Randomized block krylov methods for stronger and faster approximate singular value decomposition. Advances in neural information processing systems, 28, 2015.
  54. Recursive sampling for the nystrom method. Advances in neural information processing systems, 30, 2017.
  55. Hutch++: Optimal stochastic trace estimation. In Symposium on Simplicity in Algorithms (SOSA), pages 142–155, 2021.
  56. Stability of the Lanczos method for matrix function approximation. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1605–1624, 2018.
  57. Spectrum approximation beyond fast matrix multiplication: Algorithms and hardness. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018). Schloss-Dagstuhl-Leibniz Zentrum für Informatik, 2018.
  58. Iteration-complexity of a newton proximal extragradient method for monotone variational inequalities and inclusion problems. SIAM Journal on Optimization, 22(3):914–935, 2012.
  59. Sublinear time low-rank approximation of positive semidefinite matrices. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 672–683, 2017.
  60. Osnap: Faster numerical linear algebra algorithms via sparser subspace embeddings. In 2013 ieee 54th annual symposium on foundations of computer science, pages 117–126. IEEE, 2013.
  61. E. J. Nyström. Über die praktische auflösung von integralgleichungen mit anwendungen auf randwertaufgaben. Acta Math., 54:185–204, 1930.
  62. Christopher C. Paige. The computation of eigenvalues and eigenvectors of very large sparse matrices. PhD thesis, University of London, 1971.
  63. Christopher C. Paige. Error analysis of the Lanczos algorithm for tridiagonalizing a symmetric matrix. IMA Journal of Applied Mathematics, 18(3):341–349, 1976.
  64. Victor Pan. How to multiply matrices faster. Springer-Verlag, 1984.
  65. Solving sparse linear systems faster than matrix multiplication. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 504–521, 2021.
  66. Falkon: An optimal large scale kernel method. Advances in neural information processing systems, 30, 2017.
  67. A stochastic gradient method with an exponential convergence rate for finite training sets. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 2, pages 2663–2671, 2012.
  68. A fast randomized algorithm for overdetermined linear least-squares regression. Proceedings of the National Academy of Sciences, 105(36):13212–13217, 2008.
  69. Tamas Sarlos. Improved approximation algorithms for large matrices via random projections. In 2006 47th annual IEEE symposium on foundations of computer science (FOCS’06), pages 143–152. IEEE, 2006.
  70. Jonathan R Shewchuk. An introduction to the conjugate gradient method without the agonizing pain. Technical report, Carnegie Mellon University, USA, 1994.
  71. Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization. In Proceedings of the 31st International Conference on Machine Learning, volume 32, pages 64–72, 2014.
  72. Nearly linear time algorithms for preconditioning and solving symmetric, diagonally dominant linear systems. SIAM Journal on Matrix Analysis and Applications, 35(3):835–885, 2014.
  73. Volker Strassen. Gaussian elimination is not optimal. Numerische mathematik, 13(4):354–356, 1969.
  74. A randomized kaczmarz algorithm with exponential convergence. Journal of Fourier Analysis and Applications, 15(2):262–278, 2009.
  75. A note on preconditioning by low-stretch spanning trees. arXiv preprint arXiv:0903.2816, 2009.
  76. Joel A Tropp. Improved analysis of the subsampled randomized hadamard transform. Advances in Adaptive Data Analysis, 3(01n02):115–126, 2011.
  77. Virginia Vassilevska Williams. Multiplying matrices faster than coppersmith-winograd. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing, pages 887–898, 2012.
  78. Christopher K. I. Williams and Matthias Seeger. Using the Nyström method to speed up kernel machines. In Advances in Neural Information Processing Systems 13, pages 682–688. 2001.
  79. Subcubic equivalences between path, matrix and triangle problems. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 645–654, 2010.
  80. New bounds for matrix multiplication: from alpha to omega. arXiv preprint arXiv:2307.07970, 2023.
  81. A superfast structured solver for toeplitz linear systems via randomized sampling. SIAM Journal on Matrix Analysis and Applications, 33(3):837–858, 2012.
  82. Divide and conquer kernel ridge regression. In Conference on learning theory, pages 592–617. PMLR, 2013.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com