A Survey and Taxonomy of Graph Sampling (1308.5865v1)

Published 23 Aug 2013 in cs.SI, math.PR, and stat.ME

Abstract: Graph sampling is a technique to pick a subset of vertices and/ or edges from original graph. It has a wide spectrum of applications, e.g. survey hidden population in sociology [54], visualize social graph [29], scale down Internet AS graph [27], graph sparsification [8], etc. In some scenarios, the whole graph is known and the purpose of sampling is to obtain a smaller graph. In other scenarios, the graph is unknown and sampling is regarded as a way to explore the graph. Commonly used techniques are Vertex Sampling, Edge Sampling and Traversal Based Sampling. We provide a taxonomy of different graph sampling objectives and graph sampling approaches. The relations between these approaches are formally argued and a general framework to bridge theoretical analysis and practical implementation is provided. Although being smaller in size, sampled graphs may be similar to original graphs in some way. We are particularly interested in what graph properties are preserved given a sampling procedure. If some properties are preserved, we can estimate them on the sampled graphs, which gives a way to construct efficient estimators. If one algorithm relies on the perserved properties, we can expect that it gives similar output on original and sampled graphs. This leads to a systematic way to accelerate a class of graph algorithms. In this survey, we discuss both classical text-book type properties and some advanced properties. The landscape is tabularized and we see a lot of missing works in this field. Some theoretical studies are collected in this survey and simple extensions are made. Most previous numerical evaluation works come in an ad hoc fashion, i.e. evaluate different type of graphs, different set of properties, and different sampling algorithms. A systematical and neutral evaluation is needed to shed light on further graph sampling studies.

Citations (169)

View on Semantic Scholar

Summary

The paper provides a comprehensive survey and taxonomy of graph sampling techniques, analyzing their objectives, methodologies, and property preservation capabilities.
It categorizes methods based on sampling objectives, network types, and approaches, including classical sampling, enhanced methods, and Traversal Based Sampling (TBS) like Random Walks.
The study highlights the critical role of property preservation and estimation in enabling efficient graph algorithms on sampled data, calling for more systematic evaluation and theoretical advancement.

Overview and Insights on "A Survey and Taxonomy of Graph Sampling"

The paper "A Survey and Taxonomy of Graph Sampling" by Pili Hu and Wing Cheong Lau delivers a comprehensive analysis of graph sampling techniques, evaluating their objectives, methodologies, and potential for preserving graph properties. It serves as both a summary of existing methods and a taxonomy for understanding the relationships among various graph sampling techniques and objectives.

Graph Sampling Context and Objectives

Graph sampling is an indispensable tool when dealing with large-scale graphs, providing a method to derive smaller, manageable representations without significant loss of information. The goals of graph sampling vary, including but not limited to acquiring a representative subset of vertices in social studies, preserving specific graph properties for estimation, and supporting certain graph algorithms through reduced computational complexity. Additionally, graph sampling methods are instrumental for visualizing massive networks and scaling down complex Internet topology graphs.

Taxonomy of Graph Sampling Methods

The paper categorizes graph sampling techniques based on objectives, network types, and the methodology employed:

Sampling Objectives: Graph sampling aims to either obtain a representative subset of vertices, preserve certain properties, or generate random graphs. The intersection of property preservation and estimation provides a framework for developing efficient graph algorithms and estimators. For instance, Metropolis-Hastings Random Walk (MHRW) has been shown to preserve vertex label distributions, enabling property estimation from sampled graphs.
Network Types: The discussion addresses various network models, such as Erdos-Renyi Networks, Power-Law Networks, and Small-World Networks, indicating the tailored approaches required in different contexts due to varying structural properties.
Sampling Approaches: The surveyed methods encompass classical vertex and edge sampling, enhanced methods such as Vertex Sampling with Neighborhood, and Traversal Based Sampling (TBS). TBS techniques like Random Walks, Snowball Sampling, and Forest Fire Sampling are notably highlighted for their application in dynamic graph exploration scenarios and decentralized data environments.

Property Preservation and Estimation

A critical section of the paper discusses the conditions under which certain graph properties are preserved following sampling. This preservation allows for the introduction of efficient estimators and algorithms that perform comparably to their application on the original graph. Properties such as degree distribution, clustering coefficients, and modularity receive significant attention due to their wide usage in network analysis and community detection.

The paper provides theoretical frameworks and practical methodologies for analyzing these properties post-sampling. For example, in edge sampling with contraction, it is possible to preserve large sparsifications while maintaining network cuts, which is crucial for tasks such as network optimization and partitioning.

Theoretical and Practical Implications

While the paper collates various studies and provides theoretical insights, it notes that most past works entail ad hoc numerical evaluations with a lack of systematic analysis. The authors suggest that a comprehensive and neutral evaluation of existing sampling techniques across different graph types and objectives is required to pave the way for more robust theoretical advances and practical applications.

Future Research Directions

The paper outlines several potential areas for future research in graph sampling:

Developing a systematic numerical benchmark to evaluate the effectiveness of various sampling techniques across common graph properties and structures.
Conducting more extensive evaluations on synthetic datasets to elucidate the interplay between sampling methods and intrinsic graph properties.
Advancing theoretical studies focusing on properties lesser explored than degree distribution and clustering coefficient to enrich analytical models and real-world applications.

Conclusion

In conclusion, the paper by Hu and Lau provides a well-structured synthesis of graph sampling research, anchored in a taxonomy that clarifies the diverse methods and objectives within the field. Its discussion on property preservation and estimation highlights its potential to advance efficient sampling and related graph-processing algorithms. This survey sets the stage for further methodological refinement and application in networks where preserving structural characteristics during sampling is critical.

Related Papers

Graph Sample and Hold: A Framework for Big-Graph Analytics (2014)
Network Sampling: From Static to Streaming Graphs (2012)
A Review: Random Walk in Graph Sampling (2022)
Empirical Characterization of Graph Sampling Algorithms (2021)
Weighted Edge Sampling for Static Graphs (2019)