Papers
Topics
Authors
Recent
2000 character limit reached

When big data actually are low-rank, or entrywise approximation of certain function-generated matrices

Published 3 Jul 2024 in math.NA, cs.LG, and cs.NA | (2407.03250v4)

Abstract: The article concerns low-rank approximation of matrices generated by sampling a smooth function of two $m$-dimensional variables. We identify several misconceptions surrounding a claim that, for a specific class of analytic functions, such $n \times n$ matrices admit accurate entrywise approximation of rank that is independent of $m$ and grows as $\log(n)$ -- colloquially known as ''big-data matrices are approximately low-rank''. We provide a theoretical explanation of the numerical results presented in support of this claim, describing three narrower classes of functions for which function-generated matrices can be approximated within an entrywise error of order $\varepsilon$ with rank $\mathcal{O}(\log(n) \varepsilon{-2} \log(\varepsilon{-1}))$ that is independent of the dimension $m$: (i) functions of the inner product of the two variables, (ii) functions of the Euclidean distance between the variables, and (iii) shift-invariant positive-definite kernels. We extend our argument to tensor-train approximation of tensors generated with functions of the ''higher-order inner product'' of their multiple variables. We discuss our results in the context of low-rank approximation of (a) growing datasets and (b) attention in transformer neural networks.

Citations (2)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 0 likes about this paper.