Partitioned Sample Spacings (PSS)
- Partitioned Sample Spacings (PSS) are statistical methods that partition ordered data into contiguous segments to analyze the behavior of gaps between order statistics.
- They enable rigorous derivation of limit theorems by normalizing spacings, often revealing exponential distributions in central, intermediate, and extreme regimes.
- PSS underpin practical applications such as nonparametric entropy estimation, robust test statistics, and optimized spatial sampling designs.
Partitioned Sample Spacings (PSS) refer to families of statistics, limit theorems, and algorithmic constructions arising from summing or otherwise analyzing successive, disjoint, or ordered spacings in a finite ordered sample. PSS formalizes the intuition of partitioning the sample space (or the range of data) into segments—either deterministically (fixed-width, probability-mass) or data-adaptively—and studying the joint or marginal behavior of lengths, sums, or functional transforms of these spacings. The concept underlies goodness-of-fit statistics, efficient nonparametric entropy estimators, modern spatial sampling strategies, and classical limit theorems for order statistics and their increments.
1. Foundational Definitions and Regimes
Let denote a random sample from a continuous distribution , with associated order statistics . The -step disjoint sample spacings, or partitioned sample spacings, are most concisely described as follows. For a chosen block size (), define the spacings
These segment the data into contiguous, non-overlapping blocks of length . In regimes with specific statistical interest, e.g., central (), intermediate (, , or $1$), or extreme ( or fixed), PSS are subject to distinct normalization and convergence behaviors. For example:
- In central and intermediate regions, normalized spacings in blocks adjacent to a fixed order statistic become asymptotically i.i.d. random variables, and the sequence of cumulative normalized spacings converges in distribution to a homogeneous Poisson process on .
- For extreme regimes, such as the largest (or smallest) order statistics, spacing increments lose independence except in special distributional cases (Weibull with shape parameter , or certain Gumbel cases). Generally, increments are dependent, reflecting the heavier tails or boundaries of the underlying (Nagaraja et al., 2017).
2. Analytic Distributions and Summed Ordered Spacings
In the specific case of uniform samples, consider including augmented endpoints , and defining spacings . The spacings themselves can be ordered, and the sum of the smallest or largest spacings—termed partitioned sample spacings of order —are
The marginal density of (the sum of the smallest spacings) in the boundary-included scenario is
with normalization constants , , and the indicator. The density and its cumulative counterpart admit closed forms for moderate ; in the boundary-excluded case analogous formulas are given via a conditional on the sample span. These expressions are central in physics (gap/cluster detection), statistical quality control, and in deriving exact p-values for uniformity or goodness-of-fit via empirical gap/cluster analysis (Shtembari et al., 2020).
3. Limit Theorems and Process Structure
The joint limiting behavior of adjacent spacings around is regime-dependent. In both central and intermediate cases, under regularity conditions on , normalized increments are i.i.d. . As ,
where are i.i.d. exponentials. Independently for both left and right neighborhood blocks, the cumulative normalized sums converge in distribution to independent homogeneous Poisson processes on .
In the extreme regime, the structure is more intricate: increments defined from domain-of-attraction limits (Fréchet, Weibull, Gumbel) are generally mutually dependent. Only when the underlying distribution is Weibull with unit shape or in one-sided Gumbel limits does independence recur (Nagaraja et al., 2017). These structural results enable precise, distribution-free inference for quantiles, local density estimation, and counting in neighborhoods. The breakdown of independence in tail regimes reflects the interaction between block size, sampling window, and the tail properties of .
4. Test Statistics and Parametric Inference via PSS
Partitioned sample spacings provide the foundation for robust test statistics in parametric settings, addressing settings where likelihood-based inference is non-regular or infeasible. For a parametric family , one forms transformed spacings on the probability scale,
with . Statistics built as symmetric means of convex functions of these spacings, , permit testing of via discrepancy statistics,
where
is the generalized spacing-estimator. These normalized test statistics, with choices for Pitman efficiency, are asymptotically under the null and noncentral under -local alternatives, matching the likelihood-ratio test in local asymptotic efficiency as . When likelihood methods are undefined (mixtures, nonregular models), spacings tests remain well-defined and maintain nominal size and power (Singh et al., 2021).
5. Nonparametric and Multivariate Applications
PSS underlies recent advances in nonparametric functional estimation, most notably for joint entropy in moderate-to-high dimensions. The key construction partitions into axis-aligned cells (with ), within which local one-dimensional spacing estimators are constructed for each marginal. The product of these marginal estimates, weighted by cell counts, yields a joint density estimator: where is the cell population, controls bias-variance trade-off, and denotes marginal order statistics in cell . The plug-in estimator for entropy,
is strongly consistent, -consistent, and achieves empirical risk near or below k-nearest-neighbor and copula-adaptive approaches, with superior performance for strong correlation/heteroskedasticity but no neural density modeling or training (Ho et al., 17 Nov 2025). Computational burden scales favorably (near-linearly in and for moderate ), further supporting use in information-theoretic pipelines and machine learning tasks.
6. Spatial Sampling and Enhanced Spacing via Partitioning
PSS frameworks naturally generalize to spatial domains. In finite spatial sampling, especially for populations with auxiliary variables or explicit inclusion probabilities, a two-stage PSS procedure yields maximally spread, representative samples. The procedure partitions the population into spatially compact clusters (UP-balanced), each with probability-mass exactly 1, using constrained clustering and tour orderings. Within each cell, one unit is selected; this maximizes minimal neighbor distance (spread) and achieves exact inclusion-probability constraints.
Quantification of spreadness is achieved via a translation-invariant spreadness index based on comparing kernel-density surfaces before and after cluster-wise translation. Algorithmic enhancements include greedy local optimizations to further increase spatial balance, maintaining design-based inference guarantees. The resulting designs outperform rival spatially balanced sampling schemes on classical and novel dispersion indices (Panahbehagh et al., 28 Oct 2025). The theoretical roots trace directly to classical 1D PSS, extended to multidimensional population supports.
7. Limit Theorems for Functional Sums of PSS
Under the PSS framework, sums of symmetric or local functions over -tuples of spacings admit classical and extended central limit theorems. For uniform spacings (with or without boundaries), functionals are asymptotically normal when and suitable moment/Lindeberg conditions hold: For fixed and suitable , explicit mean and variance can be computed in terms of moments and covariances of i.i.d. exponentials (via the exponential spacings representation). Special cases include Greenwood’s statistic, Moran’s log-statistic, and entropy-type functionals, uniting a large family of classical spacing-based tests, estimators, and limit theorems within the unified PSS framework (Mirakhmedov, 15 Apr 2024).
PSS provides a theoretically grounded, computationally practical framework that unifies broad classes of inferential, estimational, and design-based methodologies across parametric, nonparametric, and spatial domains, with deep connections to exponential limit theory, Poisson process structure, and multivariate statistical practice (Nagaraja et al., 2017, Singh et al., 2021, Ho et al., 17 Nov 2025, Panahbehagh et al., 28 Oct 2025, Mirakhmedov, 15 Apr 2024, Shtembari et al., 2020).
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free