Partitioned Sample Spacings (PSS)

Updated 19 November 2025

Partitioned Sample Spacings (PSS) are statistical methods that partition ordered data into contiguous segments to analyze the behavior of gaps between order statistics.
They enable rigorous derivation of limit theorems by normalizing spacings, often revealing exponential distributions in central, intermediate, and extreme regimes.
PSS underpin practical applications such as nonparametric entropy estimation, robust test statistics, and optimized spatial sampling designs.

Partitioned Sample Spacings (PSS) refer to families of statistics, limit theorems, and algorithmic constructions arising from summing or otherwise analyzing successive, disjoint, or ordered spacings in a finite ordered sample. PSS formalizes the intuition of partitioning the sample space (or the range of data) into segments—either deterministically (fixed-width, probability-mass) or data-adaptively—and studying the joint or marginal behavior of lengths, sums, or functional transforms of these spacings. The concept underlies goodness-of-fit statistics, efficient nonparametric entropy estimators, modern spatial sampling strategies, and classical limit theorems for order statistics and their increments.

1. Foundational Definitions and Regimes

Let $X_1,\dots,X_n$ denote a random sample from a continuous distribution $F$ , with associated order statistics $X_{1:n}<\cdots<X_{n:n}$ . The $m$ -step disjoint sample spacings, or partitioned sample spacings, are most concisely described as follows. For a chosen block size $m$ ( $1\le m< n$ ), define the spacings

$D_i^{(m)} = X_{(i+m)} - X_{(i)}, \qquad i=1,\dots,n-m.$

These segment the data into contiguous, non-overlapping blocks of length $m$ . In regimes with specific statistical interest, e.g., central ( $k/n\to p\in(0,1)$ ), intermediate ( $k\to\infty$ , $n-k\to\infty$ , $k/n\to 0$ or $1$), or extreme ( $k$ or $n-k$ fixed), PSS are subject to distinct normalization and convergence behaviors. For example:

In central and intermediate regions, normalized spacings in blocks adjacent to a fixed order statistic $X_{k:n}$ become asymptotically i.i.d. $\mathrm{Exp}(1)$ random variables, and the sequence of cumulative normalized spacings converges in distribution to a homogeneous Poisson process on $\mathbb{R}_+$ .
For extreme regimes, such as the largest (or smallest) order statistics, spacing increments lose independence except in special distributional cases (Weibull with shape parameter $\alpha=1$ , or certain Gumbel cases). Generally, increments are dependent, reflecting the heavier tails or boundaries of the underlying $F$ (Nagaraja et al., 2017).

2. Analytic Distributions and Summed Ordered Spacings

In the specific case of uniform samples, consider including augmented endpoints $U_{(0)}=0$ , $U_{(n+1)}=1$ and defining spacings $D_i = U_{(i)} - U_{(i-1)}$ . The spacings themselves can be ordered, and the sum of the $k$ smallest or $k$ largest spacings—termed partitioned sample spacings of order $k$ —are

$s_k = \sum_{i=1}^k D_{(i)}, \qquad S_k = \sum_{i=1}^k D_{(n+2-i)}.$

The marginal density of $s_k$ (the sum of the $k$ smallest spacings) in the boundary-included scenario is

$f_{s_k}(s) = A(k, n) \sum_{i=1}^{k} a(i, k)\left[1 - \frac{n+2-i}{k+1-i}s\right]^{n-1} H\left(s; 0, \frac{k+1-i}{n+2-i}\right),$

with normalization constants $A(k,n)$ , $a(i,k)$ , and $H(x;a,b)$ the indicator. The density and its cumulative counterpart admit closed forms for moderate $(n,k)$ ; in the boundary-excluded case analogous formulas are given via a conditional on the sample span. These expressions are central in physics (gap/cluster detection), statistical quality control, and in deriving exact p-values for uniformity or goodness-of-fit via empirical gap/cluster analysis (Shtembari et al., 2020).

3. Limit Theorems and Process Structure

The joint limiting behavior of adjacent spacings around $X_{k:n}$ is regime-dependent. In both central and intermediate cases, under regularity conditions on $F$ , normalized increments are i.i.d. $\mathrm{Exp}(1)$ . As $n\to\infty$ ,

$Y_{n,j} := n f(x_p) (X_{k+j:n} - X_{k+j-1:n}) \xrightarrow{d} Z_j,$

where $Z_j$ are i.i.d. exponentials. Independently for both left and right neighborhood blocks, the cumulative normalized sums converge in distribution to independent homogeneous Poisson processes on $\mathbb{R}_+$ .

In the extreme regime, the structure is more intricate: increments $\Delta W_j$ defined from domain-of-attraction limits (Fréchet, Weibull, Gumbel) are generally mutually dependent. Only when the underlying distribution is Weibull with unit shape or in one-sided Gumbel limits does independence recur (Nagaraja et al., 2017). These structural results enable precise, distribution-free inference for quantiles, local density estimation, and counting in neighborhoods. The breakdown of independence in tail regimes reflects the interaction between block size, sampling window, and the tail properties of $F$ .

4. Test Statistics and Parametric Inference via PSS

Partitioned sample spacings provide the foundation for robust test statistics in parametric settings, addressing settings where likelihood-based inference is non-regular or infeasible. For a parametric family $F_{\theta}$ , one forms transformed spacings on the probability scale,

$D_{j,n}^{(m)}(\theta) = F_\theta(X_{(jm):n}) - F_\theta(X_{((j-1)m):n}),\qquad j=1,\dots,M,$

with $M=\lfloor(n+1)/m\rfloor$ . Statistics built as symmetric means of convex functions of these spacings, $S_{\phi,n}^{(m)}(\theta)$ , permit testing of $H_0:\theta=\theta_0$ via discrepancy statistics,

$T_{\phi,n}^{(m)}(\theta_0) = S_{\phi,n}^{(m)}(\theta_0) - S_{\phi,n}^{(m)}(\widehat{\theta}_{\phi,n}^{(m)}),$

where

$\widehat{\theta}_{\phi,n}^{(m)} = \arg\min_{\theta\in\Theta} S_{\phi,n}^{(m)}(\theta)$

is the generalized spacing-estimator. These normalized test statistics, with choices $\phi(x)=-\log x$ for Pitman efficiency, are asymptotically $\chi^2$ under the null and noncentral $\chi^2$ under $n^{-1/2}$ -local alternatives, matching the likelihood-ratio test in local asymptotic efficiency as $m\to\infty$ . When likelihood methods are undefined (mixtures, nonregular models), spacings tests remain well-defined and maintain nominal size and power (Singh et al., 2021).

5. Nonparametric and Multivariate Applications

PSS underlies recent advances in nonparametric functional estimation, most notably for joint entropy in moderate-to-high dimensions. The key construction partitions $\mathbb{R}^d$ into $\ell^d$ axis-aligned cells (with $\ell=o(n^{1/d})$ ), within which local one-dimensional spacing estimators are constructed for each marginal. The product of these marginal estimates, weighted by cell counts, yields a joint density estimator: $\hat f_{n,\ell}(\mathbf{x}) = \frac{n_k}{n} \prod_{j=1}^d \frac{2m_k}{n_k (x_{j,(a_j+m_k)}^k - x_{j,(a_j-m_k)}^k)},$ where $n_k$ is the cell population, $m_k \sim \sqrt{n_k}$ controls bias-variance trade-off, and $x_{j,(r)}^k$ denotes marginal order statistics in cell $k$ . The plug-in estimator for entropy,

$\widehat{H}_{n,\ell} = -\frac{1}{n}\sum_{v=1}^n \log \hat f_{n,\ell}(\mathbf{X}_v),$

is strongly consistent, $L^1$ -consistent, and achieves empirical risk near or below k-nearest-neighbor and copula-adaptive approaches, with superior performance for strong correlation/heteroskedasticity but no neural density modeling or training (Ho et al., 17 Nov 2025). Computational burden scales favorably (near-linearly in $n$ and $d$ for moderate $d$ ), further supporting use in information-theoretic pipelines and machine learning tasks.

6. Spatial Sampling and Enhanced Spacing via Partitioning

PSS frameworks naturally generalize to spatial domains. In finite spatial sampling, especially for populations with auxiliary variables or explicit inclusion probabilities, a two-stage PSS procedure yields maximally spread, representative samples. The procedure partitions the population into $n$ spatially compact clusters (UP-balanced), each with probability-mass exactly 1, using constrained clustering and tour orderings. Within each cell, one unit is selected; this maximizes minimal neighbor distance (spread) and achieves exact inclusion-probability constraints.

Quantification of spreadness is achieved via a translation-invariant spreadness index $S$ based on comparing kernel-density surfaces before and after cluster-wise translation. Algorithmic enhancements include greedy local optimizations to further increase spatial balance, maintaining design-based inference guarantees. The resulting designs outperform rival spatially balanced sampling schemes on classical and novel dispersion indices (Panahbehagh et al., 28 Oct 2025). The theoretical roots trace directly to classical 1D PSS, extended to multidimensional population supports.

7. Limit Theorems for Functional Sums of PSS

Under the PSS framework, sums of symmetric or local functions over $m$ -tuples of spacings admit classical and extended central limit theorems. For uniform spacings (with or without boundaries), functionals $T_{n,m} = \sum_{i=1}^n f(S_i, S_{i+1}, ..., S_{i+m-1})$ are asymptotically normal when $m=o(n)$ and suitable moment/Lindeberg conditions hold: $\frac{T_{n,m} - \mathbb{E}[T_{n,m}]}{\sqrt{\operatorname{Var}[T_{n,m}]}} \xrightarrow{d} N(0,1).$ For fixed $m$ and suitable $f$ , explicit mean and variance can be computed in terms of moments and covariances of i.i.d. exponentials (via the exponential spacings representation). Special cases include Greenwood’s statistic, Moran’s log-statistic, and entropy-type functionals, uniting a large family of classical spacing-based tests, estimators, and limit theorems within the unified PSS framework (Mirakhmedov, 2024).

PSS provides a theoretically grounded, computationally practical framework that unifies broad classes of inferential, estimational, and design-based methodologies across parametric, nonparametric, and spatial domains, with deep connections to exponential limit theory, Poisson process structure, and multivariate statistical practice (Nagaraja et al., 2017, Singh et al., 2021, Ho et al., 17 Nov 2025, Panahbehagh et al., 28 Oct 2025, Mirakhmedov, 2024, Shtembari et al., 2020).

PDF Markdown Chat (Pro)

References (6)

Spacings Around An Order Statistic (2017)

On the sum of ordered spacings (2020)

Some parametric tests based on sample spacings (2021)

Nonparametric Estimation of Joint Entropy through Partitioned Sample-Spacing Method (2025)

Intelligent n-Means Spatial Sampling (2025)

The central limit theorem for sum-functions of m-tuples of spacings (2024)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Partitioned Sample Spacings (PSS).

Partitioned Sample Spacings (PSS)

1. Foundational Definitions and Regimes

2. Analytic Distributions and Summed Ordered Spacings

3. Limit Theorems and Process Structure

4. Test Statistics and Parametric Inference via PSS

5. Nonparametric and Multivariate Applications

6. Spatial Sampling and Enhanced Spacing via Partitioning

7. Limit Theorems for Functional Sums of PSS

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Partitioned Sample Spacings (PSS)

1. Foundational Definitions and Regimes

2. Analytic Distributions and Summed Ordered Spacings

3. Limit Theorems and Process Structure

4. Test Statistics and Parametric Inference via PSS

5. Nonparametric and Multivariate Applications

6. Spatial Sampling and Enhanced Spacing via Partitioning

7. Limit Theorems for Functional Sums of PSS

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research