Papers
Topics
Authors
Recent
2000 character limit reached

Precise Deviations for the Ewens-Pitman Model (2512.12323v1)

Published 13 Dec 2025 in math.PR and math.ST

Abstract: In this paper, we derive an integral representation for the distribution of the number of types $K_n$ in the Ewens-Pitman model. Based on this representation, we also establish precise large deviations and precise moderate deviations for $K_n$. After careful examination, we find that the rate function exhibits a second-order phase transition and the critical point is $α=\frac{1}{2}$.

Summary

  • The paper establishes precise deviation asymptotics for Kₙ in both large and moderate regimes of the Ewens-Pitman model.
  • It derives an explicit contour integral representation and applies saddle point and steepest descent methods to obtain full polynomial prefactors in the asymptotic formulas.
  • A second-order phase transition at α = 1/2 is identified, offering new insights into the fluctuation behavior and influencing inference in population genetics and Bayesian nonparametrics.

Precise Deviations and Phase Transitions in the Ewens-Pitman Model

Introduction

The Ewens-Pitman sampling model generalizes the classical Ewens sampling formula by introducing two parameters (α,θ)(\alpha, \theta), capturing a broad spectrum of random partition phenomena with deep ties to population genetics and Bayesian nonparametrics. This model's statistical properties, particularly the behavior of the number of types KnK_n in a sample of size nn, have been the subject of active investigation. Prior work established laws of large numbers, central limit theorems, and large/moderate deviations for KnK_n. The current paper advances this line by deriving precise deviations for KnK_n in both the large and moderate deviation regimes, clarifying the polynomial prefactor, and rigorously identifying a second-order phase transition in the rate function at α=1/2\alpha=1/2.

Integral Representation of KnK_n

A cornerstone of the analysis is the derivation of an explicit contour integral representation for the probability mass function of KnK_n:

P(Kn=k)=n!k!(θ/α)k1(θ)n112πiC(1(1z)α)kzn+1dz\mathbb{P}(K_n = k) = \frac{n!}{k!} \frac{(\theta/\alpha)_{k\uparrow 1}}{(\theta)_{n\uparrow 1}} \cdot \frac{1}{2\pi i} \int_C \frac{(1 - (1-z)^\alpha)^k}{z^{n+1}} dz

with an explicit expression for the coefficients using the Gamma function and Sibuya distribution structure. This form enables a rigorous application of saddle point and steepest descent methods for asymptotic analysis. Figure 1

Figure 1: Intermediate steepest descent contour used to deform the integral in the complex plane for extracting asymptotics of P(Kn=k)\mathbb{P}(K_n = k).

Precise Local Deviation Asymptotics

The main result characterizes the asymptotics of P(Kn=k)\mathbb{P}(K_n = k) for large nn, in both large and moderate deviation regimes, including all polynomial prefactors. Let xk=k/nx_k = k/n, h(z)=lnzxkln(1(1z)α)h(z) = \ln z - x_k \ln(1-(1-z)^\alpha), and z(xk)z(x_k) the unique real solution to h(z)=0h'(z)=0 on (0,1)(0,1). The principal theorem asserts

P(Kn=k)=Γ(θ)Γ(θ/α)n(1α1)θ12z(xk)2πh(z(xk))1xkθα1enI(xk)(1+O(1xkn))\mathbb{P}(K_n = k) = \frac{\Gamma(\theta)}{\Gamma(\theta/\alpha)} n^{(\frac{1}{\alpha}-1)\theta - \frac{1}{2}}\, z(x_k) \sqrt{2\pi |h''(z(x_k))|}^{-1} x_k^{\frac{\theta}{\alpha}-1} e^{-n I(x_k)} \left(1 + O\left(\frac{1}{x_k n}\right)\right)

where I(xk)=h(z(xk))I(x_k) = h(z(x_k)) is the LDP rate function. The explicit polynomial scaling is sharp, revealing the precise combinatorics underlying the Ewens-Pitman partitions.

In the moderate deviation regime knαbn1αk \asymp n^\alpha b_n^{1-\alpha}, the probability decays as

P(Knnαbn1αy)=Γ(θ)Γ(θ/α)bn(1α)θα12[2π(1α)α2α11α]1/2yθα12(1α)enI(y(bn/n)1α)(1+error)\mathbb{P}(K_n \geq n^\alpha b_n^{1-\alpha} y) = \frac{\Gamma(\theta)}{\Gamma(\theta/\alpha)} b_n^{(1-\alpha)\frac{\theta}{\alpha}-\frac{1}{2}} \left[2\pi(1-\alpha)\alpha^{\frac{2\alpha-1}{1-\alpha}}\right]^{-1/2} y^{\frac{\theta}{\alpha}-\frac{1}{2(1-\alpha)}} e^{-n I(y (b_n/n)^{1-\alpha})} (1 + \text{error})

with bnb_n \to \infty, bn/n0b_n/n \to 0, and the rate function expanding as

nI(y(bnn)1α)bn(1α)αα1αy11αn I\left(y\left(\frac{b_n}{n}\right)^{1-\alpha}\right) \sim b_n (1-\alpha)\alpha^{\frac{\alpha}{1-\alpha}} y^{\frac{1}{1-\alpha}}

highlighting the polynomial–exponential dichotomy of tails.

Global Deviations and Summation Asymptotics

The paper extends the analysis to cumulative tails,

P(Knxn)\mathbb{P}(K_n \geq xn)

by summing the local estimates over admissible kk and invoking precise discrete Laplace-type estimates. The final asymptotics involve the derivative of the rate function and the increment of kk, capturing discretization effects significant at the scale of deviations:

P(Knxn)=Γ(θ)Γ(θ/α)n(1α1)θ12z(x)2πh(z(x))1xθα1e{nx}I(x)1eI(x)enI(x)[1+O(1xn)]\mathbb{P}(K_n \geq xn) = \frac{\Gamma(\theta)}{\Gamma(\theta/\alpha)} n^{(\frac{1}{\alpha}-1)\theta - \frac{1}{2}} z(x) \sqrt{2\pi |h''(z(x))|}^{-1} x^{\frac{\theta}{\alpha}-1} \frac{e^{-\{nx\} I'(x)}}{1 - e^{-I'(x)}} e^{-n I(x)} \left[1 + O\left(\frac{1}{xn}\right)\right]

where {nx}\{nx\} denotes the fractional part of nxnx. This representation accurately tracks the probability mass in the tails of KnK_n with explicit combinatorial and analytic constants.

Phase Transition in the Rate Function

A critical theoretical insight is the identification of a second-order phase transition at α=1/2\alpha=1/2 in the rate function's curvature:

I(x)C(α)x2α11α,x0I''(x) \sim C(\alpha) x^{\frac{2\alpha-1}{1-\alpha}},\quad x\to 0

where

C(α)={+,0<α<1/2 =const,α=1/2 0,1/2<α<1C(\alpha) = \begin{cases} +\infty, & 0 < \alpha < 1/2 \ =\text{const}, & \alpha=1/2 \ 0, & 1/2<\alpha<1 \end{cases}

The sign and scaling of the sub-exponential prefactor in moderate deviations shifts across this transition, fundamentally altering the nature of the fluctuations. This constitutes a nontrivial, explicit phase transition in the precise deviation rates: the polynomial prefactors and effective speeds in moderate LDPs qualitatively change at this critical value.

Comparison with Pitman's α\alpha-Diversity

The analysis also involves refined asymptotics for the tail of the Pitman α\alpha-diversity random variable Sα,θS_{\alpha,\theta}, the almost sure limit of Kn/nαK_n/n^\alpha, exploiting explicit integral and series representations for this variable's density and tail probabilities. The precise deviation rates for KnK_n and for the limiting fluctuation Sα,θS_{\alpha,\theta} are shown to be asymptotically compatible, providing a bridge between finite-nn and limiting behavior.

Practical and Theoretical Implications

These precise deviation results advance the toolkit available for statistical inference and hypothesis testing in contexts where the Ewens-Pitman model—particularly the distribution of the number of types—plays a central role. This includes nonparametric Bayesian methods, species sampling problems, and random partition structures in population genetics and machine learning. The explicit forms of the deviation probabilities, including subexponential corrections, enable sharper risk and error assessments for rare event analyses, informing confidence levels and credible intervals for sample diversity.

On the theoretical side, the identified phase transition provides a rare example of a non-analytic change in moderate deviation structure for an entire class of random combinatorial objects, likely bearing implications for related partition models, processes attached to Poisson-Dirichlet distributions, and the study of heavy-tailed phenomena in random discrete structures.

Conclusion

The work presents a mathematically rigorous and detailed treatment of precise deviations for KnK_n in the Ewens-Pitman model. It provides explicit integral representations, careful saddle point asymptotic expansions, full polynomial prefactors, and establishes a second-order phase transition in the deviation rate function at α=1/2\alpha=1/2. These results close a significant gap in our understanding of the Ewens-Pitman model and set a new standard for precise tail characterizations in complex random partition processes. The methods admit generalization to other partition-derived statistics and open avenues for deeper study of phase transitions in probabilistic combinatorics.

Reference: "Precise Deviations for the Ewens-Pitman Model" (2512.12323)

Whiteboard

Video Overview

Open Problems

We found no open problems mentioned in this paper.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 2 likes about this paper.