Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
GPT-5.1
GPT-5.1 114 tok/s
Gemini 3.0 Pro 53 tok/s Pro
Gemini 2.5 Flash 132 tok/s Pro
Kimi K2 176 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Partitioned Sample Spacings (PSS)

Updated 19 November 2025
  • Partitioned Sample Spacings (PSS) are statistical methods that partition ordered data into contiguous segments to analyze the behavior of gaps between order statistics.
  • They enable rigorous derivation of limit theorems by normalizing spacings, often revealing exponential distributions in central, intermediate, and extreme regimes.
  • PSS underpin practical applications such as nonparametric entropy estimation, robust test statistics, and optimized spatial sampling designs.

Partitioned Sample Spacings (PSS) refer to families of statistics, limit theorems, and algorithmic constructions arising from summing or otherwise analyzing successive, disjoint, or ordered spacings in a finite ordered sample. PSS formalizes the intuition of partitioning the sample space (or the range of data) into segments—either deterministically (fixed-width, probability-mass) or data-adaptively—and studying the joint or marginal behavior of lengths, sums, or functional transforms of these spacings. The concept underlies goodness-of-fit statistics, efficient nonparametric entropy estimators, modern spatial sampling strategies, and classical limit theorems for order statistics and their increments.

1. Foundational Definitions and Regimes

Let X1,,XnX_1,\dots,X_n denote a random sample from a continuous distribution FF, with associated order statistics X1:n<<Xn:nX_{1:n}<\cdots<X_{n:n}. The mm-step disjoint sample spacings, or partitioned sample spacings, are most concisely described as follows. For a chosen block size mm (1m<n1\le m< n), define the spacings

Di(m)=X(i+m)X(i),i=1,,nm.D_i^{(m)} = X_{(i+m)} - X_{(i)}, \qquad i=1,\dots,n-m.

These segment the data into contiguous, non-overlapping blocks of length mm. In regimes with specific statistical interest, e.g., central (k/np(0,1)k/n\to p\in(0,1)), intermediate (kk\to\infty, nkn-k\to\infty, k/n0k/n\to 0 or $1$), or extreme (kk or nkn-k fixed), PSS are subject to distinct normalization and convergence behaviors. For example:

  • In central and intermediate regions, normalized spacings in blocks adjacent to a fixed order statistic Xk:nX_{k:n} become asymptotically i.i.d. Exp(1)\mathrm{Exp}(1) random variables, and the sequence of cumulative normalized spacings converges in distribution to a homogeneous Poisson process on R+\mathbb{R}_+.
  • For extreme regimes, such as the largest (or smallest) order statistics, spacing increments lose independence except in special distributional cases (Weibull with shape parameter α=1\alpha=1, or certain Gumbel cases). Generally, increments are dependent, reflecting the heavier tails or boundaries of the underlying FF (Nagaraja et al., 2017).

2. Analytic Distributions and Summed Ordered Spacings

In the specific case of uniform samples, consider including augmented endpoints U(0)=0U_{(0)}=0, U(n+1)=1U_{(n+1)}=1 and defining spacings Di=U(i)U(i1)D_i = U_{(i)} - U_{(i-1)}. The spacings themselves can be ordered, and the sum of the kk smallest or kk largest spacings—termed partitioned sample spacings of order kk—are

sk=i=1kD(i),Sk=i=1kD(n+2i).s_k = \sum_{i=1}^k D_{(i)}, \qquad S_k = \sum_{i=1}^k D_{(n+2-i)}.

The marginal density of sks_k (the sum of the kk smallest spacings) in the boundary-included scenario is

fsk(s)=A(k,n)i=1ka(i,k)[1n+2ik+1is]n1H(s;0,k+1in+2i),f_{s_k}(s) = A(k, n) \sum_{i=1}^{k} a(i, k)\left[1 - \frac{n+2-i}{k+1-i}s\right]^{n-1} H\left(s; 0, \frac{k+1-i}{n+2-i}\right),

with normalization constants A(k,n)A(k,n), a(i,k)a(i,k), and H(x;a,b)H(x;a,b) the indicator. The density and its cumulative counterpart admit closed forms for moderate (n,k)(n,k); in the boundary-excluded case analogous formulas are given via a conditional on the sample span. These expressions are central in physics (gap/cluster detection), statistical quality control, and in deriving exact p-values for uniformity or goodness-of-fit via empirical gap/cluster analysis (Shtembari et al., 2020).

3. Limit Theorems and Process Structure

The joint limiting behavior of adjacent spacings around Xk:nX_{k:n} is regime-dependent. In both central and intermediate cases, under regularity conditions on FF, normalized increments are i.i.d. Exp(1)\mathrm{Exp}(1). As nn\to\infty,

Yn,j:=nf(xp)(Xk+j:nXk+j1:n)dZj,Y_{n,j} := n f(x_p) (X_{k+j:n} - X_{k+j-1:n}) \xrightarrow{d} Z_j,

where ZjZ_j are i.i.d. exponentials. Independently for both left and right neighborhood blocks, the cumulative normalized sums converge in distribution to independent homogeneous Poisson processes on R+\mathbb{R}_+.

In the extreme regime, the structure is more intricate: increments ΔWj\Delta W_j defined from domain-of-attraction limits (Fréchet, Weibull, Gumbel) are generally mutually dependent. Only when the underlying distribution is Weibull with unit shape or in one-sided Gumbel limits does independence recur (Nagaraja et al., 2017). These structural results enable precise, distribution-free inference for quantiles, local density estimation, and counting in neighborhoods. The breakdown of independence in tail regimes reflects the interaction between block size, sampling window, and the tail properties of FF.

4. Test Statistics and Parametric Inference via PSS

Partitioned sample spacings provide the foundation for robust test statistics in parametric settings, addressing settings where likelihood-based inference is non-regular or infeasible. For a parametric family FθF_{\theta}, one forms transformed spacings on the probability scale,

Dj,n(m)(θ)=Fθ(X(jm):n)Fθ(X((j1)m):n),j=1,,M,D_{j,n}^{(m)}(\theta) = F_\theta(X_{(jm):n}) - F_\theta(X_{((j-1)m):n}),\qquad j=1,\dots,M,

with M=(n+1)/mM=\lfloor(n+1)/m\rfloor. Statistics built as symmetric means of convex functions of these spacings, Sϕ,n(m)(θ)S_{\phi,n}^{(m)}(\theta), permit testing of H0:θ=θ0H_0:\theta=\theta_0 via discrepancy statistics,

Tϕ,n(m)(θ0)=Sϕ,n(m)(θ0)Sϕ,n(m)(θ^ϕ,n(m)),T_{\phi,n}^{(m)}(\theta_0) = S_{\phi,n}^{(m)}(\theta_0) - S_{\phi,n}^{(m)}(\widehat{\theta}_{\phi,n}^{(m)}),

where

θ^ϕ,n(m)=argminθΘSϕ,n(m)(θ)\widehat{\theta}_{\phi,n}^{(m)} = \arg\min_{\theta\in\Theta} S_{\phi,n}^{(m)}(\theta)

is the generalized spacing-estimator. These normalized test statistics, with choices ϕ(x)=logx\phi(x)=-\log x for Pitman efficiency, are asymptotically χ2\chi^2 under the null and noncentral χ2\chi^2 under n1/2n^{-1/2}-local alternatives, matching the likelihood-ratio test in local asymptotic efficiency as mm\to\infty. When likelihood methods are undefined (mixtures, nonregular models), spacings tests remain well-defined and maintain nominal size and power (Singh et al., 2021).

5. Nonparametric and Multivariate Applications

PSS underlies recent advances in nonparametric functional estimation, most notably for joint entropy in moderate-to-high dimensions. The key construction partitions Rd\mathbb{R}^d into d\ell^d axis-aligned cells (with =o(n1/d)\ell=o(n^{1/d})), within which local one-dimensional spacing estimators are constructed for each marginal. The product of these marginal estimates, weighted by cell counts, yields a joint density estimator: f^n,(x)=nknj=1d2mknk(xj,(aj+mk)kxj,(ajmk)k),\hat f_{n,\ell}(\mathbf{x}) = \frac{n_k}{n} \prod_{j=1}^d \frac{2m_k}{n_k (x_{j,(a_j+m_k)}^k - x_{j,(a_j-m_k)}^k)}, where nkn_k is the cell population, mknkm_k \sim \sqrt{n_k} controls bias-variance trade-off, and xj,(r)kx_{j,(r)}^k denotes marginal order statistics in cell kk. The plug-in estimator for entropy,

H^n,=1nv=1nlogf^n,(Xv),\widehat{H}_{n,\ell} = -\frac{1}{n}\sum_{v=1}^n \log \hat f_{n,\ell}(\mathbf{X}_v),

is strongly consistent, L1L^1-consistent, and achieves empirical risk near or below k-nearest-neighbor and copula-adaptive approaches, with superior performance for strong correlation/heteroskedasticity but no neural density modeling or training (Ho et al., 17 Nov 2025). Computational burden scales favorably (near-linearly in nn and dd for moderate dd), further supporting use in information-theoretic pipelines and machine learning tasks.

6. Spatial Sampling and Enhanced Spacing via Partitioning

PSS frameworks naturally generalize to spatial domains. In finite spatial sampling, especially for populations with auxiliary variables or explicit inclusion probabilities, a two-stage PSS procedure yields maximally spread, representative samples. The procedure partitions the population into nn spatially compact clusters (UP-balanced), each with probability-mass exactly 1, using constrained clustering and tour orderings. Within each cell, one unit is selected; this maximizes minimal neighbor distance (spread) and achieves exact inclusion-probability constraints.

Quantification of spreadness is achieved via a translation-invariant spreadness index SS based on comparing kernel-density surfaces before and after cluster-wise translation. Algorithmic enhancements include greedy local optimizations to further increase spatial balance, maintaining design-based inference guarantees. The resulting designs outperform rival spatially balanced sampling schemes on classical and novel dispersion indices (Panahbehagh et al., 28 Oct 2025). The theoretical roots trace directly to classical 1D PSS, extended to multidimensional population supports.

7. Limit Theorems for Functional Sums of PSS

Under the PSS framework, sums of symmetric or local functions over mm-tuples of spacings admit classical and extended central limit theorems. For uniform spacings (with or without boundaries), functionals Tn,m=i=1nf(Si,Si+1,...,Si+m1)T_{n,m} = \sum_{i=1}^n f(S_i, S_{i+1}, ..., S_{i+m-1}) are asymptotically normal when m=o(n)m=o(n) and suitable moment/Lindeberg conditions hold: Tn,mE[Tn,m]Var[Tn,m]dN(0,1).\frac{T_{n,m} - \mathbb{E}[T_{n,m}]}{\sqrt{\operatorname{Var}[T_{n,m}]}} \xrightarrow{d} N(0,1). For fixed mm and suitable ff, explicit mean and variance can be computed in terms of moments and covariances of i.i.d. exponentials (via the exponential spacings representation). Special cases include Greenwood’s statistic, Moran’s log-statistic, and entropy-type functionals, uniting a large family of classical spacing-based tests, estimators, and limit theorems within the unified PSS framework (Mirakhmedov, 15 Apr 2024).


PSS provides a theoretically grounded, computationally practical framework that unifies broad classes of inferential, estimational, and design-based methodologies across parametric, nonparametric, and spatial domains, with deep connections to exponential limit theory, Poisson process structure, and multivariate statistical practice (Nagaraja et al., 2017, Singh et al., 2021, Ho et al., 17 Nov 2025, Panahbehagh et al., 28 Oct 2025, Mirakhmedov, 15 Apr 2024, Shtembari et al., 2020).

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Partitioned Sample Spacings (PSS).