Learning the Positions in CountSketch (2306.06611v2)
Abstract: We consider sketching algorithms which first compress data by multiplication with a random sketch matrix, and then apply the sketch to quickly solve an optimization problem, e.g., low-rank approximation and regression. In the learning-based sketching paradigm proposed by~\cite{indyk2019learning}, the sketch matrix is found by choosing a random sparse matrix, e.g., CountSketch, and then the values of its non-zero entries are updated by running gradient descent on a training data set. Despite the growing body of work on this paradigm, a noticeable omission is that the locations of the non-zero entries of previous algorithms were fixed, and only their values were learned. In this work, we propose the first learning-based algorithms that also optimize the locations of the non-zero entries. Our first proposed algorithm is based on a greedy algorithm. However, one drawback of the greedy algorithm is its slower training time. We fix this issue and propose approaches for learning a sketching matrix for both low-rank approximation and Hessian approximation for second order optimization. The latter is helpful for a range of constrained optimization problems, such as LASSO and matrix estimation with a nuclear norm constraint. Both approaches achieve good accuracy with a fast running time. Moreover, our experiments suggest that our algorithm can still reduce the error significantly even if we only have a very limited number of training matrices.
- Differentiable convex optimization layers. In Advances in Neural Information Processing Systems, pages 9558–9570, 2019.
- Sharper bounds for regularized data fitting. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, (APPROX/RANDOM), pages 27:1–27:22, 2017.
- (learned) frequency estimation algorithms under zipfian distribution. arXiv preprint arXiv:1908.05198, 2019.
- Toward a unified theory of sparse dimensionality reduction in Euclidean space. Geometric and Functional Analysis, pages 1009–1088, 2015.
- Matthew Brand. Fast low-rank modifications of the thin singular value decomposition. Linear Algebra and its Applications, 415.1, 2006.
- Fast singular value decomposition on gpu. NVIDIA presentation at GPU Technology Conference, 2019.
- Iterative hessian sketch in input sparsity time. In Proceedings of 33rd Conference on Neural Information Processing Systems (NeurIPS), Vancouver, Canada, 2019.
- Composable sketches for functions of frequencies: Beyond the worst case. In International Conference on Machine Learning, pages 2057–2067. PMLR, 2020.
- Input sparsity time low-rank approximation via ridge leverage score sampling. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, (SODA), pages 1758–1777, 2017.
- Numerical linear algebra in the streaming model. In Proceedings of the forty-first annual symposium on Theory of computing (STOC), pages 205–214, 2009.
- Low-rank approximation and regression in input sparsity time. Journal of the ACM (JACM), 63(6):54, 2017.
- Learning sublinear-time indexing for nearest neighbor search. In International Conference on Learning Representations, 2020.
- Learning-based support estimation in sublinear time. In International Conference on Learning Representations, 2021.
- Numax: A convex approach for learning near-isometric linear embeddings. In IEEE Transactions on Signal Processing, pages 6109–6121, 2015.
- Learning-based low-rank approximations. In Advances in Neural Information Processing Systems, pages 7400–7410, 2019.
- Few-shot data-driven algorithms for low rank approximation. Advances in Neural Information Processing Systems, 34, 2021.
- Learning-augmented data stream algorithms. In International Conference on Learning Representations, 2020.
- Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pages 91–100, 2013.
- Osnap: Faster numerical linear algebra algorithms via sparser subspace embeddings. In Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on, pages 117–126, 2013.
- Lower bounds for oblivious subspace embeddings. In Automata, Languages, and Programming - 41st International Colloquium (ICALP), pages 883–894, 2014.
- Pytorch: An imperative style, high-performance deep learning library. 2019.
- Iterative Hessian sketch: Fast and accurate solution approximation for constrained least-squares. J. Mach. Learn. Res., 17:53:1–53:38, 2016.
- Tamas Sarlos. Improved approximation algorithms for large matrices via random projections. In 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 143–152, 2006.
- Training (overparametrized) neural networks in near-linear time. In James R. Lee, editor, 12th Innovations in Theoretical Computer Science Conference, ITCS, volume 185, pages 63:1–63:15, 2021.
- Roman Vershynin. Introduction to the non-asymptotic analysis of random matrices. In Yonina C. Eldar and Gitta Kutyniok, editors, Compressed Sensing: Theory and Applications, page 210–268. Cambridge University Press, 2012.
- A survey on learning to hash. In IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 769–790, 2017.