Representative network sampling: strategy, sample size, and evaluation metrics

Determine an appropriate sampling strategy that preserves the structural properties of complex networks, ascertain the minimum sample size required for accurate reconstruction of those properties, and develop metrics for properly evaluating the representativeness of sampled subgraphs relative to the original network.

Background

Network sampling aims to extract subgraphs that retain key structural properties of large, often inaccessible systems. Classical statistical assumptions (e.g., i.i.d. sampling) typically fail in networks due to dependencies introduced by edges, and existing sampling methods (random node sampling, random walks, snowball sampling, edge-based sampling) each introduce specific biases that distort properties such as clustering, degree correlations, and degree distributions. As a result, determining how to sample networks in a way that preserves their structure, how much data are needed to reliably reconstruct properties, and how to assess sampling quality are central unresolved issues.

References

Determining the most appropriate sampling strategy, the minimum sample size required for accurate reconstruction and metrics for evaluating sampling properly remains an open problem.

Prediction and inference in complex networks: a brief review and perspectives (2512.07439 - Rodrigues, 8 Dec 2025) in Section “Network sampling”