Ewens–Pitman Partition Structures
- Ewens–Pitman Partition Structures are a two-parameter framework defining exchangeable random partitions with power-law and clustering features, generalizing the classical Ewens formula.
- They leverage a sequential allocation mechanism, including the two-parameter Chinese restaurant process, to model reinforcement, limit theorems, and phase transitions in partition counts.
- Martingale techniques and combinatorial representations such as Riordan arrays underpin robust statistical inference and parameter estimation in applications ranging from genetics to Bayesian nonparametrics.
The Ewens–Pitman partition structures form the foundational framework for modeling and analyzing random partitions with power-law diversity, fundamentally unifying probabilistic, algebraic, and combinatorial methods. These structures play a central role in exchangeable random partitions, Bayesian nonparametrics, population genetics, and related areas, where the distributions of partitions encode subtle reinforcement and clustering mechanisms. The two-parameter family, indexed by and , generalizes the classical Ewens sampling formula at and is intimately linked to the Poisson–Dirichlet and Pitman–Yor processes. Modern developments cover precise deviation theory, Gaussian and large deviation limits, combinatorial representations, martingale methods, and advanced inference for parameters and predictive distributions.
1. Definition, Construction, and Characterization
The Ewens–Pitman partition structure is defined on the set of partitions of by the exchangeable partition probability function (EPPF)
where denotes the rising Pochhammer symbol, is the number of blocks, and their sizes (Peng et al., 13 Dec 2025, Wang, 17 Mar 2026). This structure is consistent and exchangeable under sampling and arises as the partition law generated by the Pitman–Yor process (two-parameter Poisson–Dirichlet process) via sampling iid atoms.
Combinatorial and representation-theoretic formulations characterize Ewens–Pitman measures as non-extreme harmonic functions on the Kingman branching graph, with algebraic structure provided by umbral interpolation and Sheffer polynomial sequences. An explicit new representation in terms of Riordan array sums enables efficient symbolic and combinatorial computation of marginal and joint quantities (Greve, 6 Mar 2025).
The process admits a dynamic continuous-time construction as a Markov chain on the space of partition multiplicity vectors, with transitions structured by immigration (appearance of new types), births (reinforcement within families), and deaths (mortality), leading to mixtures over Pitman sampling formulae and stationary reversible distributions in the presence of mortality (Giordano et al., 2019).
2. Sampling Schemes, Reinforcement, and Chinese Restaurant Process
The Ewens–Pitman process is equivalently described via a sequential allocation mechanism known as the two-parameter Chinese restaurant process. At each step given blocks of sizes 0:
- A new element joins an existing block 1 with probability 2;
- Or starts a new block with probability 3 (Peng et al., 13 Dec 2025, Wang, 17 Mar 2026).
This induces a reinforcement mechanism parameterized by 4 that yields asymptotically power-law cluster sizes and a nontrivial distribution for block counts. The process connects directly, for 5, to the classical Ewens formula (Dirichlet process) and for 6 to the Pitman–Yor process with PD7 marginal.
3. Limit Theorems: Laws of Large Numbers, Central Limit Theorems
For 8 and 9, the number of blocks 0 and the frequencies 1 (blocks of size 2) admit strong asymptotics:
- Law of large numbers: 3 a.s., where 4 is the 5-diversity, a random variable with a generalized Mittag-Leffler law (Dolera et al., 2021, Bercu et al., 2024).
- Similarly, 6 a.s., with 7 (the Sibuya law).
- Functional central limit theorems (quenched and annealed) show that fluctuations of 8 around their mean scale as 9 and decompose into independent Gaussian components arising from both sampling variability and the randomness of block frequencies, with precise covariance structure (Wang, 17 Mar 2026). The limiting fluctuations for the “self-normalized Ewens–Pitman process” (proportion frequencies 0) are governed by an infinite-dimensional Gaussian process with covariance 1 (Bercu et al., 16 Jan 2026).
In the special regime 2, 3 exhibits linear growth with 4 and a martingale-based proof yields optimal Berry–Esseen rates in the central limit theorem, with explicit rate constants depending on 5 and 6 (Ribeiro, 25 Mar 2025).
4. Large Deviations, Precise Moderate Deviations, and Phase Transition
Sharp large deviation principles (LDP) and precise moderate deviations have been established for 7, driven by the contour-integral representation of block-count probabilities: 8 Applying the saddle-point method, LDP rate function 9 is given by 0, and the LDP holds with
1
A critical phenomenon is observed: the curvature 2 undergoes a “second-order phase transition” at 3. For 4, 5 as 6, while for 7, it goes to zero. This phase transition is encoded in both large deviation and moderate deviation prefactors (Peng et al., 13 Dec 2025).
In the moderate deviation regime, block counts at intermediate scales 8 with 9, 0, obey
1
where 2 has a power-law form determined by 3 (Peng et al., 13 Dec 2025, Favaro et al., 2016).
5. Martingale and Compound Poisson Methods
Martingale techniques play a pivotal role in establishing both SLLN and CLT for 4 and 5. Via careful identification of martingale transforms (e.g., 6 with 7), almost-sure and 8 convergence is realized, with fluctuations analyzed through predictable quadratic variations and Heyde’s martingale-CLT (Bercu et al., 2024). This machinery yields both annealed and quenched CLTs, and supports laws of the iterated logarithm in both total and size-specific block counts.
In an orthogonal approach, block-size distributions in Ewens–Pitman models admit compound Poisson representations: for 9 as log-series summands, for 0 via negative binomial summands mixed with a scale-randomization tied to the generalized Mittag-Leffler law. This representation underpins alternative, probabilistically transparent derivations of 1-diversity and provides a framework for conjectural extensions to alpha-stable Poisson–Kingman laws (Dolera et al., 2021).
6. Algebraic and Combinatorial Structures
Recent developments provide explicit algebraic representations of Ewens–Pitman partition structures through umbral calculus, Sheffer sequences, and Riordan arrays. Marginals and moments of the partition distributions can be interpreted as weighted row sums or coefficient extractions in exponential Riordan arrays, with generalized Stirling numbers controlling block count statistics (Greve, 6 Mar 2025). This approach unifies many joint and marginal calculations, and streamlines symbolic computation for both moments and likelihoods.
The algebraic structure reflects the harmonic property on the Kingman branching graph, where Ewens–Pitman harmonics satisfy specific backward recursions, embedding the partition structures within representation theory of the infinite symmetric group.
7. Statistical Inference: Parameter Estimation and Predictive Inference
Estimation for 2 (the discount/strength-of-diversity parameter) is governed by highly nonstandard asymptotics. The maximum likelihood estimator (MLE) for 3, denoted 4, is 5-consistent, and its limiting distribution is a variance mixture of normal laws, with the mixing governed by the generalized Mittag-Leffler law (Koriyama et al., 2022, Koriyama, 26 Jun 2025). After normalization by an explicit Fisher information term, the estimator attains an asymptotically normal limit. Strongly consistent and computationally explicit estimators are available using self-normalized block-size statistics, e.g., proportion of singletons. Asymptotically valid confidence intervals can be constructed for both parameter and predictive probability vectors (Bercu et al., 16 Jan 2026, Koriyama, 26 Jun 2025).
The theoretical framework extends to exchangeable Gibbs partitions via mixture representations in the backward recurrence array, ensuring the persistence of asymptotic mixed-normality for parameter inference, as well as optimality of plug-in estimators for predictive probability simplexes under convex 6-divergences. These results enable tight confidence bands for all functionals of the predictive distribution (Koriyama, 26 Jun 2025).
References
- (Peng et al., 13 Dec 2025): Precise Deviations for the Ewens-Pitman Model
- (Wang, 17 Mar 2026): On central limit theorems for Ewens–Pitman model
- (Bercu et al., 16 Jan 2026): A Gaussian process limit for the self-normalized Ewens-Pitman process
- (Bercu et al., 2024): A martingale approach to Gaussian fluctuations and laws of iterated logarithm for Ewens-Pitman model
- (Greve, 6 Mar 2025): A New Representation of Ewens-Pitman's Partition Structure and Its Characterization via Riordan Array Sums
- (Dolera et al., 2021): A compound Poisson perspective of Ewens-Pitman sampling model
- (Ribeiro, 25 Mar 2025): A Martingale Approach to Large-7 Ewens-Pitman Model
- (Favaro et al., 2016): Moderate deviations for Ewens-Pitman exchangeable random partitions
- (Koriyama et al., 2022): Asymptotic mixed normality of maximum likelihood estimator for Ewens--Pitman partition
- (Koriyama, 26 Jun 2025): Asymptotic Inference for Exchangeable Gibbs Partition
- (Giordano et al., 2019): A reversible allelic partition process and Pitman sampling formula