Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning the Positions in CountSketch (2306.06611v2)

Published 11 Jun 2023 in cs.LG and cs.DS

Abstract: We consider sketching algorithms which first compress data by multiplication with a random sketch matrix, and then apply the sketch to quickly solve an optimization problem, e.g., low-rank approximation and regression. In the learning-based sketching paradigm proposed by~\cite{indyk2019learning}, the sketch matrix is found by choosing a random sparse matrix, e.g., CountSketch, and then the values of its non-zero entries are updated by running gradient descent on a training data set. Despite the growing body of work on this paradigm, a noticeable omission is that the locations of the non-zero entries of previous algorithms were fixed, and only their values were learned. In this work, we propose the first learning-based algorithms that also optimize the locations of the non-zero entries. Our first proposed algorithm is based on a greedy algorithm. However, one drawback of the greedy algorithm is its slower training time. We fix this issue and propose approaches for learning a sketching matrix for both low-rank approximation and Hessian approximation for second order optimization. The latter is helpful for a range of constrained optimization problems, such as LASSO and matrix estimation with a nuclear norm constraint. Both approaches achieve good accuracy with a fast running time. Moreover, our experiments suggest that our algorithm can still reduce the error significantly even if we only have a very limited number of training matrices.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Differentiable convex optimization layers. In Advances in Neural Information Processing Systems, pages 9558–9570, 2019.
  2. Sharper bounds for regularized data fitting. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, (APPROX/RANDOM), pages 27:1–27:22, 2017.
  3. (learned) frequency estimation algorithms under zipfian distribution. arXiv preprint arXiv:1908.05198, 2019.
  4. Toward a unified theory of sparse dimensionality reduction in Euclidean space. Geometric and Functional Analysis, pages 1009–1088, 2015.
  5. Matthew Brand. Fast low-rank modifications of the thin singular value decomposition. Linear Algebra and its Applications, 415.1, 2006.
  6. Fast singular value decomposition on gpu. NVIDIA presentation at GPU Technology Conference, 2019.
  7. Iterative hessian sketch in input sparsity time. In Proceedings of 33rd Conference on Neural Information Processing Systems (NeurIPS), Vancouver, Canada, 2019.
  8. Composable sketches for functions of frequencies: Beyond the worst case. In International Conference on Machine Learning, pages 2057–2067. PMLR, 2020.
  9. Input sparsity time low-rank approximation via ridge leverage score sampling. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, (SODA), pages 1758–1777, 2017.
  10. Numerical linear algebra in the streaming model. In Proceedings of the forty-first annual symposium on Theory of computing (STOC), pages 205–214, 2009.
  11. Low-rank approximation and regression in input sparsity time. Journal of the ACM (JACM), 63(6):54, 2017.
  12. Learning sublinear-time indexing for nearest neighbor search. In International Conference on Learning Representations, 2020.
  13. Learning-based support estimation in sublinear time. In International Conference on Learning Representations, 2021.
  14. Numax: A convex approach for learning near-isometric linear embeddings. In IEEE Transactions on Signal Processing, pages 6109–6121, 2015.
  15. Learning-based low-rank approximations. In Advances in Neural Information Processing Systems, pages 7400–7410, 2019.
  16. Few-shot data-driven algorithms for low rank approximation. Advances in Neural Information Processing Systems, 34, 2021.
  17. Learning-augmented data stream algorithms. In International Conference on Learning Representations, 2020.
  18. Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pages 91–100, 2013.
  19. Osnap: Faster numerical linear algebra algorithms via sparser subspace embeddings. In Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on, pages 117–126, 2013.
  20. Lower bounds for oblivious subspace embeddings. In Automata, Languages, and Programming - 41st International Colloquium (ICALP), pages 883–894, 2014.
  21. Pytorch: An imperative style, high-performance deep learning library. 2019.
  22. Iterative Hessian sketch: Fast and accurate solution approximation for constrained least-squares. J. Mach. Learn. Res., 17:53:1–53:38, 2016.
  23. Tamas Sarlos. Improved approximation algorithms for large matrices via random projections. In 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 143–152, 2006.
  24. Training (overparametrized) neural networks in near-linear time. In James R. Lee, editor, 12th Innovations in Theoretical Computer Science Conference, ITCS, volume 185, pages 63:1–63:15, 2021.
  25. Roman Vershynin. Introduction to the non-asymptotic analysis of random matrices. In Yonina C. Eldar and Gitta Kutyniok, editors, Compressed Sensing: Theory and Applications, page 210–268. Cambridge University Press, 2012.
  26. A survey on learning to hash. In IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 769–790, 2017.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com