Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sharp Analysis of Sketch-and-Project Methods via a Connection to Randomized Singular Value Decomposition (2208.09585v2)

Published 20 Aug 2022 in math.OC, cs.NA, math.NA, and stat.ML

Abstract: Sketch-and-project is a framework which unifies many known iterative methods for solving linear systems and their variants, as well as further extensions to non-linear optimization problems. It includes popular methods such as randomized Kaczmarz, coordinate descent, variants of the Newton method in convex optimization, and others. In this paper, we develop a theoretical framework for obtaining sharp guarantees on the convergence rate of sketch-and-project methods. Our approach is the first to: (1) show that the convergence rate improves at least linearly with the sketch size, and even faster when the data matrix exhibits certain spectral decays; and (2) allow for sparse sketching matrices, which are more efficient than dense sketches and more robust than sub-sampling methods. In particular, our results explain an observed phenomenon that a radical sparsification of the sketching matrix does not affect the per iteration convergence rate of sketch-and-project. To obtain our results, we develop new non-asymptotic spectral bounds for the expected sketched projection matrix, which are of independent interest; and we establish a connection between the convergence rates of iterative sketch-and-project solvers and the approximation error of randomized singular value decomposition, which is a widely used one-shot sketching algorithm for low-rank approximation. Our experiments support the theory and demonstrate that even extremely sparse sketches exhibit the convergence properties predicted by our framework.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. S. Agmon. The relaxation method for linear inequalities. Canadian Journal of Mathematics, 6:382–392, 1954.
  2. N. Ailon and B. Chazelle. The fast Johnson–Lindenstrauss transform and approximate nearest neighbors. SIAM Journal on computing, 39(1):302–322, 2009.
  3. F. Bach. On the equivalence between kernel quadrature rules and random feature expansions. The Journal of Machine Learning Research, 18(1):714–751, 2017.
  4. Kernel quadrature with dpps. In Advances in Neural Information Processing Systems, pages 12927–12937, 2019.
  5. An improved approximation algorithm for the column subset selection problem. Proceedings of the Annual Symposium on Discrete Algorithms, 12 2008.
  6. S. Boyd and L. Vandenberghe. Convex optimization. Cambridge university press, 2004.
  7. J. Briskman and D. Needell. Block Kaczmarz method with inequalities. Journal of Mathematical Imaging and Vision, 52(3):385–396, 2015.
  8. Rates of convergence for sparse variational Gaussian process regression. In Proceedings of the 36th International Conference on Machine Learning, pages 862–871, 2019.
  9. Sampling from a k𝑘kitalic_k-DPP without looking at all items. In Advances in Neural Information Processing Systems, volume 33, pages 6889–6899, 2020.
  10. Libsvm: A library for support vector machines. ACM transactions on intelligent systems and technology (TIST), 2(3):1–27, 2011.
  11. Low-rank approximation and regression in input sparsity time. J. ACM, 63(6):54:1–54:45, Jan. 2017.
  12. R. D. Cook and L. Forzani. On the mean and variance of the generalized inverse of a singular Wishart matrix. Electron. J. Statist., 5:146–158, 2011.
  13. Local operator theory, random matrices and Banach spaces. Handbook of the geometry of Banach spaces, 1(317-366):131, 2001.
  14. A sampling Kaczmarz–Motzkin algorithm for linear feasibility. SIAM Journal on Scientific Computing, 39(5):S66–S87, 2017.
  15. M. Dereziński. Fast determinantal point processes via distortion-free intermediate sampling. In Proceedings of the Thirty-Second Conference on Learning Theory, pages 1029–1049, 2019.
  16. M. Dereziński. Algorithmic gaussianization through sketching: Converting data into sub-gaussian random designs. arXiv preprint arXiv:2206.10291, 2022.
  17. Improved guarantees and a multiple-descent curve for Column Subset Selection and the Nyström method. In Advances in Neural Information Processing Systems, volume 33, pages 4953–4964, 2020.
  18. Newton-LESS: Sparsification without trade-offs for the sketched newton update. Advances in Neural Information Processing Systems, 34:2835–2847, 2021.
  19. Precise expressions for random projections: Low-rank approximation and randomized Newton. In Advances in Neural Information Processing Systems, volume 33, pages 18272–18283, 2020.
  20. Exact expressions for double descent and implicit regularization via surrogate random design. In Advances in Neural Information Processing Systems, volume 33, pages 5152–5164, 2020.
  21. Sparse sketches with small inversion bias. In Proceedings of the 34th Conference on Learning Theory, 2021.
  22. M. Dereziński and M. W. Mahoney. Determinantal point processes in randomized numerical linear algebra. Notices of the American Mathematical Society, 68(1):34–45, 2021.
  23. Fast approximation of matrix coherence and statistical leverage. J. Mach. Learn. Res., 13(1):3475–3506, Dec. 2012.
  24. Fast approximation of matrix coherence and statistical leverage. Journal of Machine Learning Research, 13:3475–3506, 2012.
  25. RSN: Randomized subspace Newton. In Advances in Neural Information Processing Systems 32, pages 614–623. 2019.
  26. On adaptive sketch-and-project for solving linear systems. SIAM Journal on Matrix Analysis and Applications, 42(2):954–989, 2021.
  27. R. M. Gower and P. Richtárik. Randomized iterative methods for linear systems. SIAM. J. Matrix Anal. & Appl., 36(4), 1660–1690, 2015, 2015.
  28. R. M. Gower and P. Richtárik. Randomized quasi-Newton updates are linearly convergent matrix inversion algorithms. SIAM Journal on Matrix Analysis and Applications, 38(4):1380–1409, 2017.
  29. J. Haddock and A. Ma. Greed works: An improved analysis of sampling Kaczmarz–Motzkin. SIAM Journal on Mathematics of Data Science, 3(1):342–368, 2021.
  30. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2):217–288, 2011.
  31. Stochastic subspace cubic Newton method. In International Conference on Machine Learning, pages 4027–4038. PMLR, 2020.
  32. A tail inequality for quadratic forms of subgaussian random vectors. Electronic Communications in Probability, 17, 2012.
  33. M. Itskov. Tensor algebra and tensor analysis for engineers. Springer, 2007.
  34. S. Kaczmarz. Angenaherte auflosung von systemen linearer gleichungen. Bull. Int. Acad. Pol. Sic. Let., Cl. Sci. Math. Nat., pages 355–357, 1937.
  35. D. Leventhal and A. S. Lewis. Randomized methods for linear constraints: convergence rates and conditioning. Mathematics of Operations Research, 35(3):641–654, 2010.
  36. Randomized numerical linear algebra: Foundations and algorithms. Acta Numerica, 29:403–572, 2020.
  37. C. D. Meyer. Generalized inversion of modified matrices. SIAM Journal on Applied Mathematics, 24(3):315–323, 1973.
  38. Kernel mean embedding of distributions: A review and beyond. Foundations and Trends in Machine Learning, 10(1-2):1–144, 2017.
  39. Convergence analysis of block coordinate algorithms with determinantal sampling. In International Conference on Artificial Intelligence and Statistics, pages 3110–3120, 2020.
  40. I. Necoara and M. Takác. Randomized sketch descent methods for non-separable linearly constrained optimization. IMA Journal of Numerical Analysis, 41(2):1056–1092, 2021.
  41. D. Needell and J. A. Tropp. Paved with good intentions: analysis of a randomized block Kaczmarz method. Linear Algebra and its Applications, 441:199–221, 2014.
  42. D. Needell and R. Ward. Two-subspace projection method for coherent overdetermined systems. Journal of Fourier Analysis and Applications, 19(2):256–269, 2013.
  43. J. Nelson and H. L. Nguyên. Osnap: Faster numerical linear algebra algorithms via sparser subspace embeddings. In Proceedings of the 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, FOCS ’13, pages 117–126, 2013.
  44. E. Rebrova and D. Needell. Sketching for Motzkin’s iterative method for linear systems. In 2019 53rd Asilomar Conference on Signals, Systems, and Computers, pages 271–275. IEEE, 2019.
  45. E. Rebrova and D. Needell. On block Gaussian sketching for the Kaczmarz method. Numerical Algorithms, 86(1):443–473, 2021.
  46. A. Rodomanov and D. Kropotov. A randomized coordinate descent method with volume sampling. SIAM Journal on Optimization, 30(3):1878–1904, 2020.
  47. M. Rudelson and R. Vershynin. Hanson-Wright inequality and sub-gaussian concentration. Electronic Communications in Probability, 18, 2013.
  48. J. W. Silverstein and Z. Bai. On the empirical distribution of eigenvalues of a class of large dimensional random matrices. Journal of Multivariate analysis, 54(2):175–192, 1995.
  49. A Hilbert space embedding for distributions. In International Conference on Algorithmic Learning Theory, pages 13–31. Springer, 2007.
  50. S. Steinerberger. Randomized Kaczmarz converges along small singular vectors. SIAM Journal on Matrix Analysis and Applications, 42(2):608–615, 2021.
  51. T. Strohmer and R. Vershynin. A randomized Kaczmarz algorithm with exponential convergence. Journal of Fourier Analysis and Applications, 15(2):262–278, 2009.
  52. J. A. Tropp. Improved analysis of the subsampled randomized Hadamard transform. Advances in Adaptive Data Analysis, 3(01n02):115–126, 2011.
  53. R. Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
  54. D. P. Woodruff. Sketching as a tool for numerical linear algebra. arXiv preprint arXiv:1411.4357, 2014.
  55. Sketched Newton–Raphson. SIAM Journal on Optimization, 32(3):1555–1583, 2022.
  56. A. Zouzias and N. M. Freris. Randomized extended Kaczmarz for solving least squares. SIAM Journal on Matrix Analysis and Applications, 34(2):773–793, 2013.
Citations (15)

Summary

We haven't generated a summary for this paper yet.