Space-Efficient Sampling from Social Activity Streams (1206.4952v1)

Published 20 Jun 2012 in cs.SI, cs.DB, physics.soc-ph, and stat.AP

Abstract: In order to efficiently study the characteristics of network domains and support development of network systems (e.g. algorithms, protocols that operate on networks), it is often necessary to sample a representative subgraph from a large complex network. Although recent subgraph sampling methods have been shown to work well, they focus on sampling from memory-resident graphs and assume that the sampling algorithm can access the entire graph in order to decide which nodes/edges to select. Many large-scale network datasets, however, are too large and/or dynamic to be processed using main memory (e.g., email, tweets, wall posts). In this work, we formulate the problem of sampling from large graph streams. We propose a streaming graph sampling algorithm that dynamically maintains a representative sample in a reservoir based setting. We evaluate the efficacy of our proposed methods empirically using several real-world data sets. Across all datasets, we found that our method produce samples that preserve better the original graph distributions.

Citations (27)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Space-Efficient Sampling from Social Activity Streams (1206.4952v1)

Summary

Related Papers