Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Better size estimation for sparse matrix products (1006.4173v2)

Published 21 Jun 2010 in cs.DS and cs.DB

Abstract: We consider the problem of doing fast and reliable estimation of the number of non-zero entries in a sparse boolean matrix product. This problem has applications in databases and computer algebra. Let n denote the total number of non-zero entries in the input matrices. We show how to compute a 1 +- epsilon approximation (with small probability of error) in expected time O(n) for any epsilon > 4/\sqrt[4]{z}. The previously best estimation algorithm, due to Cohen (JCSS 1997), uses time O(n/epsilon2). We also present a variant using O(sort(n)) I/Os in expectation in the cache-oblivious model. In contrast to these results, the currently best algorithms for computing a sparse boolean matrix product use time omega(n{4/3}) (resp. omega(n{4/3}/B) I/Os), even if the result matrix has only z=O(n) nonzero entries. Our algorithm combines the size estimation technique of Bar-Yossef et al. (RANDOM 2002) with a particular class of pairwise independent hash functions that allows the sketch of a set of the form A x C to be computed in expected time O(|A|+|C|) and O(sort(|A|+|C|)) I/Os. We then describe how sampling can be used to maintain (independent) sketches of matrices that allow estimation to be performed in time o(n) if z is sufficiently large. This gives a simpler alternative to the sketching technique of Ganguly et al. (PODS 2005), and matches a space lower bound shown in that paper. Finally, we present experiments on real-world data sets that show the accuracy of both our methods to be significantly better than the worst-case analysis predicts.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Rasmus Resen Amossen (3 papers)
  2. Andrea Campagna (7 papers)
  3. Rasmus Pagh (88 papers)
Citations (27)

Summary

We haven't generated a summary for this paper yet.