Distinguishing subsampled power laws from other heavy-tailed distributions (2404.09614v1)
Abstract: Distinguishing power-law distributions from other heavy-tailed distributions is challenging, and this task is often further complicated by subsampling effects. In this work, we evaluate the performance of two commonly used methods for detecting power-law distributions - the maximum likelihood method of Clauset et al. and the extreme value method of Voitalov et al. - in distinguishing subsampled power laws from two other heavy-tailed distributions, the lognormal and the stretched exponential distributions. We focus on a random subsampling method commonly applied in network science and biological sciences. In this subsampling scheme, we are ultimately interested in the frequency distribution of elements with a certain number of constituent parts, and each part is selected to the subsample with an equal probability. We investigate how well the results obtained from subsamples generalize to the original distribution. Our results show that the power-law exponent of the original distribution can be estimated fairly accurately from subsamples, but classifying the distribution correctly is more challenging. The maximum likelihood method falsely rejects the power-law hypothesis for a large fraction of subsamples from power-law distributions. While the extreme value method correctly recognizes subsampled power-law distributions with all tested subsampling depths, its capacity to distinguish power laws from the heavy-tailed alternatives is limited. However, these false positives tend to result not from the subsampling itself but from the estimators' inability to classify the original sample correctly. In fact, we show that the extreme value method can sometimes be expected to perform better on subsamples than on the original samples from the lognormal and the stretched exponential distributions, while the contrary is true for the main tests included in the maximum likelihood method.
- A.-L. Barabási and R. Albert, Emergence of scaling in random networks, Science 286, 509 (1999).
- P. Jelenković and J. Tan, Can retransmissions of superexponential documents cause subexponential delays?, in Proc. IEEE Infocom 2007 (Anchorage, 2007) pp. 892–900.
- R. van der Hofstad, G. Hooghiemstra, and D. Znamenski, Distances in random graphs with finite mean and infinite variance degrees, Electronic Journal of Probability 12, 703 (2007).
- A. Broido and A. Clauset, Scale-free networks are rare, Nature Communications 10, 1017 (2019).
- A. Clauset, C. Shalizi, and M. Newman, Power-law distributions in empirical data, SIAM Review 51, 661 (2007).
- A. Levina and V. Priesemann, Subsampling scaling, Nature Communications 8, 15140 (2017).
- H. Shimadzu and R. Darnell, Attenuation of species abundance distributions by sampling, Royal Society Open Science 2, 140219 (2015).
- M. Stumpf and C. Wiuf, Sampling properties of random graphs: The degree distribution, Physical Review E 72, 036118 (2005).
- Y. Malevergne, V. Pisarenko, and D. Sornette, Empirical distributions of stock returns: Between the stretched exponential and the power law?, Quantitative Finance 5, 379 (2005).
- Y. Malevergne, V. Pisarenko, and D. Sornette, Testing the Pareto against the lognormal distributions with the uniformly most powerful unbiased test applied to the distribution of cities, Physical Review E 83, 036111 (2011).
- S. Foss, D. Korshunov, and S. Zachary, An Introduction to Heavy-Tailed and Subexponential Distributions (Springer, New York, 2013).
- M. Stumpf, C. Wiuf, and R. May, Subnets of scale-free networks are not scale-free: Sampling properties of networks, Proceedings of the National Academy of Sciences of the United States of America 102, 4221 (2005).
- Q. Vuong, Likelihood ratio tests for model selection and non-nested hypotheses, Econometrica 57, 307 (1989).
- M. Charras-Garrido and P. Lezaud, Extreme value analysis: an introduction, Journal de la Societe Française de Statistique 154, 66 (2013).
- T. Shimura, Discretization of distributions in the maximum domain of attraction, Extremes 15, 1 (2012).
- B. M. Hill, A simple general approach to inference about the tail of a distribution, Annals of Statistics 3, 1163 (1975).
- A. L. M. Dekkers, J. H. J. Einmahl, and L. de Haan, A moment estimator for the index of an extreme-value distribution, Annals of Statistics 17, 1833 (1989).
- P. Groeneboom, H. P. Lopuhaä, and P. P. de Wolf, Kernel-type estimators for the extreme value index, Annals of Statistics 31, 1956 (2003).
- Supplemental material, includes Refs. 28-30.
- R. Bartle and J. Joichi, The preservation of convergence of measurable functions under composition, Proceedings of the American Mathematical Society 12, 122 (1961).
- J. Alstott, E. Bullmore, and D. Plenz, powerlaw: a Python package for analysis of heavy-tailed distributions, PLoS ONE 9(4): e95816 (2014).
- A.-L. Barabási and M. Pósfai, Network Science (Cambridge University Press, Cambridge, 2016).
- M. Stumpf and M. Porter, Critical truths about power laws, Science 335, 665 (2012).
- Á. Corral, F. Font, and J. Camacho, Noncharacteristic half-lives in radioactive decay, Physical Review E 83, 066103 (2011).
- E. K. H. Salje, A. Planes, and E. Vives, Analysis of crackling noise using the maximum-likelihood method: Power-law mixing and exponential damping, Physical Review E 96, 042122 (2017).
- M. Bee, M. Riccaboni, and S. Schiavo, Pareto versus lognormal: a maximum entropy test, Physical Review E 84, 026104 (2011).
- Á. Corral and Á. González, Power law size distributions in geoscience revisited, Earth and Space Science 6, 673 (2019).
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.