Origin of EV method’s improved accuracy on subsampled lognormal distributions

Ascertain whether the observed increase in the accuracy of Voitalov et al.’s extreme value–based classification for subsampled lognormal distributions arises from intrinsic properties of the incident subgraph–subsampled distributions or instead from reduced sample sizes in heavily downsampled data.

Background

In simulations, the authors note an exception where the extreme value (EV) method appears to perform better on subsamples than on original samples for some lognormal distributions. Because simulations are sensitive to sample size, it is crucial to identify whether this improvement reflects genuine changes induced by subsampling or merely artifacts of smaller n.

Resolving this uncertainty would clarify the conditions under which EV-based tail index estimators reliably distinguish lognormal tails from power laws after incident subgraph sampling.

References

However, as the results of the simulations depend rather heavily on the sample size n, it is not clear whether this exception originates from the properties of the subsampled distributions or simply from the smaller amount of data points in the substantially downsampled data.

Distinguishing subsampled power laws from other heavy-tailed distributions (2404.09614 - Sormunen et al., 15 Apr 2024) in Section “Classifying subsamples”