Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sparse Coresets for SVD on Infinite Streams (2002.06296v3)

Published 15 Feb 2020 in cs.DS

Abstract: In streaming Singular Value Decomposition (SVD), $d$-dimensional rows of a possibly infinite matrix arrive sequentially as points in $\mathbb{R}d$. An $\epsilon$-coreset is a (much smaller) matrix whose sum of square distances of the rows to any hyperplane approximates that of the original matrix to a $1 \pm \epsilon$ factor. Our main result is that we can maintain a $\epsilon$-coreset while storing only $O(d \log2 d / \epsilon2)$ rows. Known lower bounds of $\Omega(d / \epsilon2)$ rows show that this is nearly optimal. Moreover, each row of our coreset is a weighted subset of the input rows. This is highly desirable since it: (1) preserves sparsity; (2) is easily interpretable; (3) avoids precision errors; (4) applies to problems with constraints on the input. Previous streaming results for SVD that return a subset of the input required storing $\Omega(d \log3 n / \epsilon2)$ rows where $n$ is the number of rows seen so far. Our algorithm, with storage independent of $n$, is the first result that uses finite memory on infinite streams. We support our findings with experiments on the Wikipedia dataset benchmarked against state-of-the-art algorithms.

Summary

We haven't generated a summary for this paper yet.