Sharp Analysis of Sketch-and-Project Methods via a Connection to Randomized Singular Value Decomposition (2208.09585v2)
Abstract: Sketch-and-project is a framework which unifies many known iterative methods for solving linear systems and their variants, as well as further extensions to non-linear optimization problems. It includes popular methods such as randomized Kaczmarz, coordinate descent, variants of the Newton method in convex optimization, and others. In this paper, we develop a theoretical framework for obtaining sharp guarantees on the convergence rate of sketch-and-project methods. Our approach is the first to: (1) show that the convergence rate improves at least linearly with the sketch size, and even faster when the data matrix exhibits certain spectral decays; and (2) allow for sparse sketching matrices, which are more efficient than dense sketches and more robust than sub-sampling methods. In particular, our results explain an observed phenomenon that a radical sparsification of the sketching matrix does not affect the per iteration convergence rate of sketch-and-project. To obtain our results, we develop new non-asymptotic spectral bounds for the expected sketched projection matrix, which are of independent interest; and we establish a connection between the convergence rates of iterative sketch-and-project solvers and the approximation error of randomized singular value decomposition, which is a widely used one-shot sketching algorithm for low-rank approximation. Our experiments support the theory and demonstrate that even extremely sparse sketches exhibit the convergence properties predicted by our framework.
- S. Agmon. The relaxation method for linear inequalities. Canadian Journal of Mathematics, 6:382–392, 1954.
- N. Ailon and B. Chazelle. The fast Johnson–Lindenstrauss transform and approximate nearest neighbors. SIAM Journal on computing, 39(1):302–322, 2009.
- F. Bach. On the equivalence between kernel quadrature rules and random feature expansions. The Journal of Machine Learning Research, 18(1):714–751, 2017.
- Kernel quadrature with dpps. In Advances in Neural Information Processing Systems, pages 12927–12937, 2019.
- An improved approximation algorithm for the column subset selection problem. Proceedings of the Annual Symposium on Discrete Algorithms, 12 2008.
- S. Boyd and L. Vandenberghe. Convex optimization. Cambridge university press, 2004.
- J. Briskman and D. Needell. Block Kaczmarz method with inequalities. Journal of Mathematical Imaging and Vision, 52(3):385–396, 2015.
- Rates of convergence for sparse variational Gaussian process regression. In Proceedings of the 36th International Conference on Machine Learning, pages 862–871, 2019.
- Sampling from a k𝑘kitalic_k-DPP without looking at all items. In Advances in Neural Information Processing Systems, volume 33, pages 6889–6899, 2020.
- Libsvm: A library for support vector machines. ACM transactions on intelligent systems and technology (TIST), 2(3):1–27, 2011.
- Low-rank approximation and regression in input sparsity time. J. ACM, 63(6):54:1–54:45, Jan. 2017.
- R. D. Cook and L. Forzani. On the mean and variance of the generalized inverse of a singular Wishart matrix. Electron. J. Statist., 5:146–158, 2011.
- Local operator theory, random matrices and Banach spaces. Handbook of the geometry of Banach spaces, 1(317-366):131, 2001.
- A sampling Kaczmarz–Motzkin algorithm for linear feasibility. SIAM Journal on Scientific Computing, 39(5):S66–S87, 2017.
- M. Dereziński. Fast determinantal point processes via distortion-free intermediate sampling. In Proceedings of the Thirty-Second Conference on Learning Theory, pages 1029–1049, 2019.
- M. Dereziński. Algorithmic gaussianization through sketching: Converting data into sub-gaussian random designs. arXiv preprint arXiv:2206.10291, 2022.
- Improved guarantees and a multiple-descent curve for Column Subset Selection and the Nyström method. In Advances in Neural Information Processing Systems, volume 33, pages 4953–4964, 2020.
- Newton-LESS: Sparsification without trade-offs for the sketched newton update. Advances in Neural Information Processing Systems, 34:2835–2847, 2021.
- Precise expressions for random projections: Low-rank approximation and randomized Newton. In Advances in Neural Information Processing Systems, volume 33, pages 18272–18283, 2020.
- Exact expressions for double descent and implicit regularization via surrogate random design. In Advances in Neural Information Processing Systems, volume 33, pages 5152–5164, 2020.
- Sparse sketches with small inversion bias. In Proceedings of the 34th Conference on Learning Theory, 2021.
- M. Dereziński and M. W. Mahoney. Determinantal point processes in randomized numerical linear algebra. Notices of the American Mathematical Society, 68(1):34–45, 2021.
- Fast approximation of matrix coherence and statistical leverage. J. Mach. Learn. Res., 13(1):3475–3506, Dec. 2012.
- Fast approximation of matrix coherence and statistical leverage. Journal of Machine Learning Research, 13:3475–3506, 2012.
- RSN: Randomized subspace Newton. In Advances in Neural Information Processing Systems 32, pages 614–623. 2019.
- On adaptive sketch-and-project for solving linear systems. SIAM Journal on Matrix Analysis and Applications, 42(2):954–989, 2021.
- R. M. Gower and P. Richtárik. Randomized iterative methods for linear systems. SIAM. J. Matrix Anal. & Appl., 36(4), 1660–1690, 2015, 2015.
- R. M. Gower and P. Richtárik. Randomized quasi-Newton updates are linearly convergent matrix inversion algorithms. SIAM Journal on Matrix Analysis and Applications, 38(4):1380–1409, 2017.
- J. Haddock and A. Ma. Greed works: An improved analysis of sampling Kaczmarz–Motzkin. SIAM Journal on Mathematics of Data Science, 3(1):342–368, 2021.
- Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2):217–288, 2011.
- Stochastic subspace cubic Newton method. In International Conference on Machine Learning, pages 4027–4038. PMLR, 2020.
- A tail inequality for quadratic forms of subgaussian random vectors. Electronic Communications in Probability, 17, 2012.
- M. Itskov. Tensor algebra and tensor analysis for engineers. Springer, 2007.
- S. Kaczmarz. Angenaherte auflosung von systemen linearer gleichungen. Bull. Int. Acad. Pol. Sic. Let., Cl. Sci. Math. Nat., pages 355–357, 1937.
- D. Leventhal and A. S. Lewis. Randomized methods for linear constraints: convergence rates and conditioning. Mathematics of Operations Research, 35(3):641–654, 2010.
- Randomized numerical linear algebra: Foundations and algorithms. Acta Numerica, 29:403–572, 2020.
- C. D. Meyer. Generalized inversion of modified matrices. SIAM Journal on Applied Mathematics, 24(3):315–323, 1973.
- Kernel mean embedding of distributions: A review and beyond. Foundations and Trends in Machine Learning, 10(1-2):1–144, 2017.
- Convergence analysis of block coordinate algorithms with determinantal sampling. In International Conference on Artificial Intelligence and Statistics, pages 3110–3120, 2020.
- I. Necoara and M. Takác. Randomized sketch descent methods for non-separable linearly constrained optimization. IMA Journal of Numerical Analysis, 41(2):1056–1092, 2021.
- D. Needell and J. A. Tropp. Paved with good intentions: analysis of a randomized block Kaczmarz method. Linear Algebra and its Applications, 441:199–221, 2014.
- D. Needell and R. Ward. Two-subspace projection method for coherent overdetermined systems. Journal of Fourier Analysis and Applications, 19(2):256–269, 2013.
- J. Nelson and H. L. Nguyên. Osnap: Faster numerical linear algebra algorithms via sparser subspace embeddings. In Proceedings of the 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, FOCS ’13, pages 117–126, 2013.
- E. Rebrova and D. Needell. Sketching for Motzkin’s iterative method for linear systems. In 2019 53rd Asilomar Conference on Signals, Systems, and Computers, pages 271–275. IEEE, 2019.
- E. Rebrova and D. Needell. On block Gaussian sketching for the Kaczmarz method. Numerical Algorithms, 86(1):443–473, 2021.
- A. Rodomanov and D. Kropotov. A randomized coordinate descent method with volume sampling. SIAM Journal on Optimization, 30(3):1878–1904, 2020.
- M. Rudelson and R. Vershynin. Hanson-Wright inequality and sub-gaussian concentration. Electronic Communications in Probability, 18, 2013.
- J. W. Silverstein and Z. Bai. On the empirical distribution of eigenvalues of a class of large dimensional random matrices. Journal of Multivariate analysis, 54(2):175–192, 1995.
- A Hilbert space embedding for distributions. In International Conference on Algorithmic Learning Theory, pages 13–31. Springer, 2007.
- S. Steinerberger. Randomized Kaczmarz converges along small singular vectors. SIAM Journal on Matrix Analysis and Applications, 42(2):608–615, 2021.
- T. Strohmer and R. Vershynin. A randomized Kaczmarz algorithm with exponential convergence. Journal of Fourier Analysis and Applications, 15(2):262–278, 2009.
- J. A. Tropp. Improved analysis of the subsampled randomized Hadamard transform. Advances in Adaptive Data Analysis, 3(01n02):115–126, 2011.
- R. Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
- D. P. Woodruff. Sketching as a tool for numerical linear algebra. arXiv preprint arXiv:1411.4357, 2014.
- Sketched Newton–Raphson. SIAM Journal on Optimization, 32(3):1555–1583, 2022.
- A. Zouzias and N. M. Freris. Randomized extended Kaczmarz for solving least squares. SIAM Journal on Matrix Analysis and Applications, 34(2):773–793, 2013.