Source of EV method’s accuracy increase under subsampling: distributional properties versus sample-size effects

Ascertain whether the observed increase in the extreme value method’s classification accuracy for some subsampled lognormal distributions is driven by intrinsic properties of the subsampled distributions or instead by reduced sample sizes in heavily downsampled data, thereby clarifying the mechanism behind improved performance under subsampling.

Background

In simulations, the authors noticed an apparent improvement in the extreme value (EV) method’s ability to correctly classify certain lognormal distributions as non-power-law when subsampled. Because subsampling both changes the distribution’s shape and reduces the number of observations, it was unclear which factor underlies the improved accuracy.

To investigate, they later analyze theoretical subsampled distributions, but the paper explicitly flags the uncertainty about whether performance gains are due to distributional changes or sample-size reductions, motivating further clarification.

References

However, as the results of the simulations depend rather heavily on the sample size n, it is not clear whether this exception originates from the properties of the subsampled distributions or simply from the smaller amount of data points in the substantially downsampled data.

Distinguishing subsampled power laws from other heavy-tailed distributions (2404.09614 - Sormunen et al., 15 Apr 2024) in Subsection Classifying subsamples