Ewens Sampling Distributions (ESD)
- Ewens Sampling Distributions are discrete laws defining allele frequency spectra and permutation cycles in genetic and combinatorial contexts using a tunable parameter.
- The model employs combinatorial structures such as the Chinese restaurant process and Stirling numbers to generate and analyze random partitions.
- It underpins statistical inference in genetics with explicit formulas, Poisson approximations, and asymptotic results for cycle counts and diversity measures.
The Ewens Sampling Distributions (ESD) are a foundational family of discrete probability laws governing the structure of random partitions, permutations, and allele frequency spectra. Initially emerging in population genetics, ESDs model the distribution of genetic types in finite samples under neutrality, but their combinatorial structure has had a far-reaching impact across algebra, probability, combinatorics, and statistical inference. At its core, the Ewens distribution assigns to each partition (or permutation) a probability proportional to powers of the number of cycles or distinct classes, governed by a tunable parameter θ. Its tractability and universality derives from a rich interplay between generating functions, exchangeable partition structures, Poisson-Dirichlet processes, and exact combinatorial constructions such as the Chinese restaurant process.
1. Foundations and Definition
The Ewens Sampling Distribution (ESD) is defined on the symmetric group . For any permutation , let denote the number of cycles in its disjoint cyclic decomposition. Then, for ,
where
is the rising factorial. For , the ESD is supported only on -cycles, if is an 0-cycle, and zero otherwise (Schickentanz, 23 Oct 2025).
Alternatively, the ESD describes the sampling distribution of integer partitions: take a sample of 1 elements, count the number 2 of distinct blocks (for instance allelic types), and their multiplicities 3 with 4. Then
5
(Jenkins et al., 2010, Gan et al., 2019). The combinatorial normalization arises from the recursion of unsigned Stirling numbers of the first kind and the total number of such set partitions.
A key population-genetic interpretation is: in the infinite-alleles neutral mutation model, with mutation parameter θ, ESD arises as the equilibrium sampling formula for genetic variation under mutation-drift balance (Gan et al., 2019).
2. Combinatorial Constructions and Sampling Algorithms
The Chinese restaurant process is a powerful sequential construction for generating permutations or partitions under ESD. At each step 6,
- With probability 7, a new cycle (or block) is started.
- With probability 8, the new element joins an existing cycle (chosen relative to its length).
This admits an 9 time sampling algorithm for permutations or partitions with the ESD law (Eberhard, 2018). The process naturally induces a random exchangeable partition structure, consistent with marginal reductions (Schiavo et al., 2023). For colored or multitype data, the polychromatic Ewens distributions extend this structure (Schiavo et al., 2023).
The cycle structure induced by ESD has the property that, conditionally on the total sample size, the numbers of cycles of each length 0 behave as independent Poisson variables with means 1, up to the conditioning 2 (Silva et al., 2020, Eberhard, 2018).
3. Exact and Asymptotic Results
The ESD supports explicit closed-form expressions for a wide class of statistics:
- Probability for the number of cycles: For 3 cycles, 4, where 5 is the unsigned Stirling number of the first kind (Peng et al., 13 Dec 2025, Silva et al., 2020).
- Cycle length frequency: The expected number of cycles of length 6 is 7 (Silva et al., 2020, Peng et al., 13 Dec 2025), independent of 8 for 9 and 0 fixed.
- Moments and Poisson approximation: For each fixed 1, as 2, 3, and joint finite-dimensional distributions decouple (Silva et al., 2020, Féray, 2012, Bakšajeva et al., 2013).
The Ewens distribution interpolates between:
- Uniform distribution on 4-cycles (5)
- Uniform distribution on 6 (7)
- Concentration on the identity permutation as 8 (Schickentanz, 23 Oct 2025).
For statistics such as the inversion count in permutations, exact formulas and monotonicity results are available: the expected number of inversions is strictly decreasing in 9, and becomes convex for 0 (Schickentanz, 23 Oct 2025).
Limit theorems for additive permutation statistics and the number of types in random partitions (including Poisson and normal approximations) follow by method-of-moments arguments and coupling constructions (Féray, 2012, Wiroonsri, 2017, Bakšajeva et al., 2013).
4. Ewens-Pitman and Generalized Models
The classical ESD admits a two-parameter extension, the Ewens–Pitman sampling model (EPSM), with parameters 1. For 2 this reduces to ESD. The distribution of the number of types 3 admits an explicit integral representation. Precise large and moderate deviation rate functions for 4 have been established, exhibiting second-order phase transitions at 5 (Peng et al., 13 Dec 2025). The EPSM and associated compound Poisson representations clarify connections to Poisson–Kingman random partitions and allow for scaling-limit results (e.g., convergence of 6 to a Mittag–Leffler law) (Dolera et al., 2021).
The ESD also admits refined generalizations:
- Refined Ewens Sampling Formula: Models where alleles are divided into 7 classes with class-specific mutation rates 8, yielding a joint distribution on a 9 matrix of counts. This law generalizes to the rESF, whose asymptotic and Poisson approximation properties remain tractable (Strahov, 2024).
- Infinite-allele model with selection: Generalizations of the Ewens formula exist for arbitrary fitness landscapes, expressing the sampling probabilities using confluent hypergeometric functions and partitions by fitness class (Khromov et al., 2016).
5. Statistical Inference and Properties
The ESD underpins likelihood-based and Bayesian inference of diversity and mutation rates in population samples. Given observed frequency counts 0, the likelihood is
1
where 2 (Hirose et al., 2021). Moments and unbiased estimators for the number of alleles occurring exactly 3 times admit closed-form solutions: 4 and bias-reducing corrections can be constructed that are uniformly minimum variance unbiased up to fourth order asymptotics, with explicit variance formulas (Hirose et al., 2021).
Invariant moments—that is, polynomial combinations of sample frequency counts whose expectations do not depend on sample size—provide diagnostics for scale-free behavior and are uniquely determined under ESD (Rossi, 2013).
Sharp variance inequalities for linear statistics of cycle counts have been obtained: for any real coefficients 5,
6
with equality for extremal choices of 7 and explicit identification of the eigenstructure governing the dependence between cycle counts (Baronenas et al., 2020).
6. Analytical, Limit, and Coupling Results
Stein's method and exchangeable pairs have been adapted to the ESD, yielding explicit Berry-Esseen bounds for normal approximation of combinatorial statistics, and providing third-moment control based on the coupling structure induced by the ESD (Wiroonsri, 2017). In the infinite-population Wright–Fisher limit, the ESD describes the stationary distribution of partitions, and total variation distance bounds quantify the accuracy for large finite populations (Gan et al., 2019).
Derangement counts, dashed patterns (permutation statistics), and properties of random permutation matrices (characteristic polynomials, traces, etc.) under ESD have been resolved by precise factorial-moment and coupling arguments (Silva et al., 2020, Féray, 2012, Bakšajeva et al., 2013). In most contexts, small cycle counts become asymptotically independent Poisson, while additive statistics can show Poisson–normal phase transitions, quasi–Poisson or compound Poisson behavior depending on scaling.
The ESD and its generalizations form the backbone of a broad theory connecting combinatorial random structures, exchangeable partitions, Markov and urn processes, and the probabilistic and statistical modeling of diversity.