On Set Size Distribution Estimation and the Characterization of Large Networks via Sampling (1209.0736v2)

Published 4 Sep 2012 in math.ST, cs.IT, cs.SI, math.IT, and stat.TH

Abstract: In this work we study the set size distribution estimation problem, where elements are randomly sampled from a collection of non-overlapping sets and we seek to recover the original set size distribution from the samples. This problem has applications to capacity planning, network theory, among other areas. Examples of real-world applications include characterizing in-degree distributions in large graphs and uncovering TCP/IP flow size distributions on the Internet. We demonstrate that it is hard to estimate the original set size distribution. The recoverability of original set size distributions presents a sharp threshold with respect to the fraction of elements that remain in the sets. If this fraction remains below a threshold, typically half of the elements in power-law and heavier-than-exponential-tailed distributions, then the original set size distribution is unrecoverable. We also discuss practical implications of our findings.

Citations (27)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

On Set Size Distribution Estimation and the Characterization of Large Networks via Sampling (1209.0736v2)

Summary

Related Papers