Dice Question Streamline Icon: https://streamlinehq.com

Scaling threshold needed to recover asymptotic coverage with finite MC templates

Determine the minimal scaling of data and/or Monte Carlo sample sizes in binned Poisson-likelihood analyses with nuisance parameters and finite Monte Carlo–derived templates (treated via the full Barlow–Beeston likelihood) that is sufficient to restore the asymptotic validity of Wilks’ theorem and yield correct coverage for confidence intervals constructed from the profile-likelihood ratio or Hessian methods. Specify quantitative conditions on the total number of events N, the number of bins n, and the data-to-MC statistical power ratio k under which these asymptotic confidence intervals attain nominal coverage in the presence of Monte Carlo statistical fluctuations.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper demonstrates with a high-statistics toy model that confidence intervals based on asymptotic properties (Hessian or profile-likelihood ratio) can systematically under-cover when model templates are derived from finite Monte Carlo samples and nuisance parameters are profiled. Even with seemingly large event counts and many bins, Wilks’ theorem can be invalidated due to Monte Carlo–induced fluctuations in both the gradient with respect to the parameter of interest and the Jacobian with respect to nuisance parameters.

The authors show that increasing the size of the data and/or simulation eventually recovers correct coverage for asymptotic methods, but they do not identify a principled quantitative threshold for how large the scaling must be. This leaves open the task of characterizing, in terms of N, n, and k, the conditions under which asymptotic confidence intervals regain their nominal coverage in the presence of finite Monte Carlo uncertainties.

References

We found that the asymptotic properties can eventually be recovered by applying a sufficiently large scaling of the data and/or simulation sizes, but we could not find a clear indication of which scale should be considered sufficiently 'large'.

Under-coverage in high-statistics counting experiments with finite MC samples (2401.10542 - Alexe et al., 19 Jan 2024) in Section 2, Discussion