A test for normality based on self-similarity

Published 4 Apr 2026 in stat.ME | (2604.03810v1)

Abstract: Testing for normality is a widely used procedure in statistics and data analysis, often applied prior to employing methods that rely on the assumption of normally distributed data. While several existing tests target distributional characteristics such as higher-order moments, others focus on functional aspects such as the distribution function. In this article, we propose an alternative idea by exploiting the self-similarity property of the normal distribution and introduce the Self-Similarity Test for Normality (SSTN). This procedure leverages the structural property that the distribution of a suitably centered and scaled sum of independent and identically distributed random variables with finite variance coincides with the original distribution if and only if that distribution is normal. The SSTN evaluates normality by applying a self-similarity transformation to the standardized empirical characteristic function and examining how the transformed functions change across successive applications. For the normal distribution, repeated applications preserve the functional form of the characteristic function, whereas deviations from normality manifest in systematic changes between consecutive transforms. These changes are aggregated into a test statistic, whose null distribution is obtained by Monte Carlo calibration, using a sample-size-specific calibration for small samples and an approximation of the asymptotic null distribution for larger ones. A comprehensive simulation study shows that the SSTN performs at least competitively and frequently superior to several well-established tests for normality.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper presents the Self-Similarity Test for Normality (SSTN) that leverages the unique self-similarity property of the normal distribution.
It employs iterative transforms of the empirical characteristic function and aggregates discrepancies using weighted integration, aided by Monte Carlo simulation for calibration.
Empirical results show competitive performance against established tests for detecting asymmetric, heavy-tailed, and multimodal deviations, with potential for multivariate extensions.

A Test for Normality Based on Self-Similarity

Introduction

This paper introduces a novel approach to testing normality in univariate data—leveraging the unique self-similarity property of the normal distribution and constructing the Self-Similarity Test for Normality (SSTN) (2604.03810). The normal distribution is the only distribution with finite variance that is invariant under affine combinations (i.e., convolution with centering and scaling). Traditional normality tests typically assess finite sample moments, compare empirical and theoretical CDFs, or examine certain functionals. In contrast, the SSTN exploits the distributional fixed-point property in the space of characteristic functions.

Theoretical Framework

Self-Similarity and the Normal Law

A distribution $P_X$ is self-similar if, for any $m \in \mathbb{N}$ , there exist constants $a_m > 0$ and $b_m \in \mathbb{R}$ such that

$a_m \sum_{j=1}^m X_j + b_m \overset{\mathrm{d}}{=} X$

where $X_1, \dots, X_m$ are i.i.d. copies of $X$ . For absolutely continuous distributions with finite variance, this property isolates the normal distribution. On the Fourier level, this is equivalently phrased as a functional equation on characteristic functions,

$\exp\left(it b_m\right) \left[\psi_X(a_m t)\right]^m = \psi_X(t)$

with $a_m = 1/\sqrt{m}$ and $b_m = (1-\sqrt{m})\mu$ for the normal family.

Empirical Self-Similarity Operator

Defining the usual empirical characteristic function (ECF)

$m \in \mathbb{N}$ 0

and a standardized and centered version

$m \in \mathbb{N}$ 1

the self-similarity operator is

$m \in \mathbb{N}$ 2

and repeated application generates transformed ECFs $m \in \mathbb{N}$ 3, $m \in \mathbb{N}$ 4.

For normal data, consecutive transforms are nearly identical for all $m \in \mathbb{N}$ 5; deviations from normality imply growing discrepancies across iterations.

Discrepancy Aggregation and Test Statistic

To quantify deviation, the procedures define

$m \in \mathbb{N}$ 6

and aggregate into an integrated squared difference over $m \in \mathbb{N}$ 7 with a weight $m \in \mathbb{N}$ 8:

$m \in \mathbb{N}$ 9

Standardized discrepancies

$a_m > 0$ 0

are computed under the null, and the final test statistic is the max-absolute standardized discrepancy:

$a_m > 0$ 1

Asymptotic Theory and Implementation

Distributional Properties

A detailed Gaussian process limit framework is developed for the ECF under the null. As $a_m > 0$ 2, it is shown that the law of the integrated discrepancy measures converges to a quadratic functional of a limiting Gaussian process. The dependence on $a_m > 0$ 3 (number of transforms) is handled carefully to ensure power across a wide spectrum of non-Gaussian alternatives.

Monte Carlo simulation is necessary to estimate the null distribution of $a_m > 0$ 4 due to analytic intractability, with a switch to the asymptotic limit distribution for $a_m > 0$ 5 for computational efficiency. Linearization of the test statistic via Delta method arguments and explicit covariance computation in Fourier space facilitate implementation.

Practical Computation

For fixed $a_m > 0$ 6 (default $a_m > 0$ 7) and a dense grid of $a_m > 0$ 8 (default $a_m > 0$ 9 points in $b_m \in \mathbb{R}$ 0 with $b_m \in \mathbb{R}$ 1), the procedure simulates the ECF under normality, computes the standardized discrepancy functional, and uses Monte Carlo to establish quantiles of the null distribution. The weighting parameter $b_m \in \mathbb{R}$ 2 (default $b_m \in \mathbb{R}$ 3) controls the relative contribution of the center versus the tails of the Fourier transform, crucial for managing variance inflation due to heavy-tailed behavior of nonparametric ECF estimation.

The SSTN is provided as an R package ("sstn").

Empirical Results

A comprehensive simulation study is conducted, comparing SSTN against Shapiro–Wilk, Anderson–Darling, Jarque–Bera, Lilliefors, and D’Agostino–Pearson tests. The alternatives span a range of distributions (gamma, chi-square, lognormal, Weibull, $b_m \in \mathbb{R}$ 4, mixtures, convolutions, and normals), parameters, and sample sizes.

Findings include:

Type I error rate of SSTN is well controlled, typically inside 95% acceptance bands.
Against asymmetric alternatives, SSTN often matches or exceeds the power of competing methods, in some scenarios outperforming even the Shapiro–Wilk test.
For heavy-tailed symmetric alternatives (e.g., low-DOF $b_m \in \mathbb{R}$ 5), SSTN is competitive with moment-based tests (Jarque–Bera, D’Agostino–Pearson).
SSTN underperforms for pure uniform distributions or settings closely resembling the uniform, relative to D’Agostino–Pearson and Shapiro–Wilk.
For mixed normals and distributions with complex multimodality, performance remains robust and at least competitive.
SSTN is highly adaptive due to the multi-level approach, ensuring a maximal test power even when the effect appears mainly at specific scales.

Implications and Future Directions

The SSTN provides a theoretically novel and empirically competitive addition to the normality testing toolkit, addressing a gap in the exploitation of self-similarity—a uniquely defining property of the normal law under finite variance. The focus on the empirical characteristic function, combined with iterative self-similarity transforms, allows for a test sensitive to both local and global departures from normality not captured by moment-based or EDF-based tests.

The authors outline several potential extensions: alternative or data-driven weight functions (possibly adaptive or covariance-based), different $b_m \in \mathbb{R}$ 6-norms, and especially extension to the multivariate setting, where self-similarity characterizes the entire multivariate normal family, which could offer a unified and powerful test in high-dimensional inference tasks.

Conclusion

The Self-Similarity Test for Normality introduces a sound theoretical and computational framework for normality assessment, based on the self-similarity property unique to the normal distribution. Analytical advances in asymptotic linearization and distributional convergence, combined with practical simulation-based calibration, demonstrate that SSTN achieves robust type I error rates and often superior power relative to well-established alternatives. The method is especially compelling for practitioners seeking a test grounded in first principles and possessing strong adaptability across a variety of alternatives. Future work on adaptive weighting and high-dimensional/multivariate extensions could further enhance the reach of this approach.

Markdown Report Issue