Abstract: Testing for normality is a widely used procedure in statistics and data analysis, often applied prior to employing methods that rely on the assumption of normally distributed data. While several existing tests target distributional characteristics such as higher-order moments, others focus on functional aspects such as the distribution function. In this article, we propose an alternative idea by exploiting the self-similarity property of the normal distribution and introduce the Self-Similarity Test for Normality (SSTN). This procedure leverages the structural property that the distribution of a suitably centered and scaled sum of independent and identically distributed random variables with finite variance coincides with the original distribution if and only if that distribution is normal. The SSTN evaluates normality by applying a self-similarity transformation to the standardized empirical characteristic function and examining how the transformed functions change across successive applications. For the normal distribution, repeated applications preserve the functional form of the characteristic function, whereas deviations from normality manifest in systematic changes between consecutive transforms. These changes are aggregated into a test statistic, whose null distribution is obtained by Monte Carlo calibration, using a sample-size-specific calibration for small samples and an approximation of the asymptotic null distribution for larger ones. A comprehensive simulation study shows that the SSTN performs at least competitively and frequently superior to several well-established tests for normality.
The paper presents the Self-Similarity Test for Normality (SSTN) that leverages the unique self-similarity property of the normal distribution.
It employs iterative transforms of the empirical characteristic function and aggregates discrepancies using weighted integration, aided by Monte Carlo simulation for calibration.
Empirical results show competitive performance against established tests for detecting asymmetric, heavy-tailed, and multimodal deviations, with potential for multivariate extensions.
A Test for Normality Based on Self-Similarity
Introduction
This paper introduces a novel approach to testing normality in univariate data—leveraging the unique self-similarity property of the normal distribution and constructing the Self-Similarity Test for Normality (SSTN) (2604.03810). The normal distribution is the only distribution with finite variance that is invariant under affine combinations (i.e., convolution with centering and scaling). Traditional normality tests typically assess finite sample moments, compare empirical and theoretical CDFs, or examine certain functionals. In contrast, the SSTN exploits the distributional fixed-point property in the space of characteristic functions.
Theoretical Framework
Self-Similarity and the Normal Law
A distribution PX is self-similar if, for any m∈N, there exist constants am>0 and bm∈R such that
amj=1∑mXj+bm=dX
where X1,…,Xm are i.i.d. copies of X. For absolutely continuous distributions with finite variance, this property isolates the normal distribution. On the Fourier level, this is equivalently phrased as a functional equation on characteristic functions,
exp(itbm)[ψX(amt)]m=ψX(t)
with am=1/m and bm=(1−m)μ for the normal family.
and repeated application generates transformed ECFs m∈N3, m∈N4.
For normal data, consecutive transforms are nearly identical for all m∈N5; deviations from normality imply growing discrepancies across iterations.
Discrepancy Aggregation and Test Statistic
To quantify deviation, the procedures define
m∈N6
and aggregate into an integrated squared difference over m∈N7 with a weight m∈N8:
m∈N9
Standardized discrepancies
am>00
are computed under the null, and the final test statistic is the max-absolute standardized discrepancy:
am>01
Asymptotic Theory and Implementation
Distributional Properties
A detailed Gaussian process limit framework is developed for the ECF under the null. As am>02, it is shown that the law of the integrated discrepancy measures converges to a quadratic functional of a limiting Gaussian process. The dependence on am>03 (number of transforms) is handled carefully to ensure power across a wide spectrum of non-Gaussian alternatives.
Monte Carlo simulation is necessary to estimate the null distribution of am>04 due to analytic intractability, with a switch to the asymptotic limit distribution for am>05 for computational efficiency. Linearization of the test statistic via Delta method arguments and explicit covariance computation in Fourier space facilitate implementation.
Practical Computation
For fixed am>06 (default am>07) and a dense grid of am>08 (default am>09 points in bm∈R0 with bm∈R1), the procedure simulates the ECF under normality, computes the standardized discrepancy functional, and uses Monte Carlo to establish quantiles of the null distribution. The weighting parameter bm∈R2 (default bm∈R3) controls the relative contribution of the center versus the tails of the Fourier transform, crucial for managing variance inflation due to heavy-tailed behavior of nonparametric ECF estimation.
The SSTN is provided as an R package ("sstn").
Empirical Results
A comprehensive simulation study is conducted, comparing SSTN against Shapiro–Wilk, Anderson–Darling, Jarque–Bera, Lilliefors, and D’Agostino–Pearson tests. The alternatives span a range of distributions (gamma, chi-square, lognormal, Weibull, bm∈R4, mixtures, convolutions, and normals), parameters, and sample sizes.
Findings include:
Type I error rate of SSTN is well controlled, typically inside 95% acceptance bands.
Against asymmetric alternatives, SSTN often matches or exceeds the power of competing methods, in some scenarios outperforming even the Shapiro–Wilk test.
For heavy-tailed symmetric alternatives (e.g., low-DOF bm∈R5), SSTN is competitive with moment-based tests (Jarque–Bera, D’Agostino–Pearson).
SSTN underperforms for pure uniform distributions or settings closely resembling the uniform, relative to D’Agostino–Pearson and Shapiro–Wilk.
For mixed normals and distributions with complex multimodality, performance remains robust and at least competitive.
SSTN is highly adaptive due to the multi-level approach, ensuring a maximal test power even when the effect appears mainly at specific scales.
Implications and Future Directions
The SSTN provides a theoretically novel and empirically competitive addition to the normality testing toolkit, addressing a gap in the exploitation of self-similarity—a uniquely defining property of the normal law under finite variance. The focus on the empirical characteristic function, combined with iterative self-similarity transforms, allows for a test sensitive to both local and global departures from normality not captured by moment-based or EDF-based tests.
The authors outline several potential extensions: alternative or data-driven weight functions (possibly adaptive or covariance-based), different bm∈R6-norms, and especially extension to the multivariate setting, where self-similarity characterizes the entire multivariate normal family, which could offer a unified and powerful test in high-dimensional inference tasks.
Conclusion
The Self-Similarity Test for Normality introduces a sound theoretical and computational framework for normality assessment, based on the self-similarity property unique to the normal distribution. Analytical advances in asymptotic linearization and distributional convergence, combined with practical simulation-based calibration, demonstrate that SSTN achieves robust type I error rates and often superior power relative to well-established alternatives. The method is especially compelling for practitioners seeking a test grounded in first principles and possessing strong adaptability across a variety of alternatives. Future work on adaptive weighting and high-dimensional/multivariate extensions could further enhance the reach of this approach.