Epps–Pulley Statistic in Normality Testing
- Epps–Pulley statistic is an L²-type goodness-of-fit test based on the empirical characteristic function, offering affine invariance and explicit asymptotic theory for normality assessment.
- The test uses a tuning parameter β to balance sensitivity between heavy-tailed and smooth departures from normality, with optimal power at intermediate β values.
- Its extensions to multivariate and functional data, along with favorable Bahadur efficiency, position it as a strong benchmark compared to traditional EDF-based tests.
The Epps–Pulley statistic, based on the empirical characteristic function (ECF), is a weighted -type goodness-of-fit test for normality (and other parametric families) that enjoys affine invariance and explicit asymptotic theory. Originally conceived for univariate data, it has been extended to multivariate and even functional data settings, and its theoretical properties—most notably, Bahadur efficiency—have been analyzed in detail for a range of alternatives. This class of tests is widely regarded as a benchmark for goodness-of-fit to the normal distribution, especially in moderate to high dimensions, due to its amenability to extension and its favorable power properties under central and mixture-type alternatives.
1. Definition and Formulae
Let be i.i.d. observations in , and denote by
the affine-invariant standardized residuals, with sample mean and sample covariance . The empirical characteristic function is
Under normality, the theoretical characteristic function is . The Epps–Pulley statistic with tuning parameter (sometimes denoted in the univariate case) is
with the Gaussian weight
In , a closed form is available:
This statistic, and its multivariate extension—sometimes termed the BHEP test—is affine invariant and universally consistent for the normal family in any dimension.
2. Tuning Parameter and Sensitivity
The tuning parameter in regulates the test's sensitivity to various departures from normality:
- Small : Places weight on large , raising sensitivity to heavy tails and fine, high-frequency deviations (e.g., Cauchy-like alternatives).
- Large : Concentrates on small , enhancing power against broad, smooth deviations (e.g., small skewness/kurtosis).
Empirical and Bahadur-efficiency analyses consistently recommend intermediate values (–1) for robust general-purpose power (Ebner et al., 2021, Meintanis et al., 2022). For heavy-tailed alternatives, a smaller () is suitable; for mild deviations, moderate to large (–3) may be justified.
3. Asymptotic Null Distribution and Eigenvalue Problem
Under the null hypothesis, converges in distribution to a weighted sum of independent variables: where the are eigenvalues of the integral operator on with covariance kernel determined by the Gaussian null and the parameter estimation regime (Ebner et al., 2021). In , explicit computation of eigenvalues has been achieved (Ebner et al., 2021), and truncation at is sufficient for practical approximation of quantiles: For practical use, one can estimate critical values either via this series or by Monte Carlo resampling under the normal model.
4. Bahadur Slope and Local Efficiencies
For alternatives close to normality, large deviation principles and Bahadur theory provide the asymptotic (local) efficiency of the test, relative to the optimal (likelihood ratio) test. The Bahadur slope is
where is the largest eigenvalue of the integral operator and is the test's limiting expectation under the alternative (Meintanis et al., 2022). The corresponding local approximate Bahadur efficiency (LABE) is
with the local Kullback–Leibler information.
Table: LABE of the BHEP test (univariate, selected )
| Alternative | |||
|---|---|---|---|
| Lehmann | 0.75 | 0.86 | 0.94 |
| 1st Ley-Paindaveine | 0.95 | 0.99 | 0.99 |
| Contamination | 0.50 | 0.60 | 0.68 |
| Energy (=1) | 0.57 | – | – |
A summary for six classical alternatives and a broad range of values, as in (Ebner et al., 2021), demonstrates that for , the Epps–Pulley test either matches or outperforms EDF-based tests (KS, Cramér–von Mises, Anderson–Darling) over the full alternative battery.
5. Multivariate and Functional Extensions
- Multivariate BHEP: For and Gaussian weight , the multivariate BHEP statistic remains
with similar affine invariance and limiting null law as above (Meintanis et al., 2022).
- Functional data: The approach generalizes in (Henze et al., 2019) to Hilbert-space (functional) data, replacing the characteristic function with the empirical characteristic functional and introducing a suitable probability measure as kernel/weight. The resulting test remains consistent, with null distribution approximated via a parametric bootstrap.
6. Practical Implementation and Recommendations
- Computational Complexity: The univariate closed-form version (pairwise sum) is . Sub-sampling or fast Gauss transform acceleration can be employed in large regimes (Ebner et al., 2021).
- Critical Value Determination: Either simulate the limiting Gaussian quadratic form, given the leading eigenvalues, or use a parametric bootstrap under the normal fit (Meintanis et al., 2022, Henze et al., 2019).
- Parameter Tuning: in is widely recommended for robust performance; practitioners facing heavy-tailed alternatives may lower to $0.25$ (Ebner et al., 2021, Meintanis et al., 2022).
- Comparison with Other Tests: For the Epps–Pulley test outperforms Kolmogorov–Smirnov on all close alternatives, and for it is superior to Cramér–von Mises, Watson, and Watson–Darling tests over the six-alternative set (Ebner et al., 2021, Meintanis et al., 2022).
7. Impact and Extensions
The Epps–Pulley/BHEP framework provides a principled, explicit, and flexible family of goodness-of-fit tests for normality and exponentiality, with generalization to high dimensions and to infinite-dimensional contexts. Its analytical tractability permits precise asymptotic power calculations, supporting informed choice of tuning parameters. Recent work emphasizes its optimality within the class of affine-invariant -type tests and its consistently superior (or competitive) Bahadur efficiency over EDF-based competitors for a broad range of alternatives (Meintanis et al., 2022, Ebner et al., 2021).
A plausible implication is that, for moderate to large sample sizes and particularly for high-dimensional settings, the Epps–Pulley/BHEP test with judicious tuning provides a default methodology for normality assessment, combining interpretability, consistency, and power. Further research continues to explore its refinements, computational acceleration, and generalizations to other distributional families.