Pitman-Yor Chinese Restaurant Process
- The Pitman-Yor Chinese Restaurant Process is a Bayesian nonparametric model that extends the Dirichlet Process by incorporating a discount parameter to induce power-law behavior in cluster sizes.
- It employs a stick-breaking construction and an exchangeable partition probability function to efficiently model random partitions and facilitate scalable inference.
- The model is widely applied in hierarchical topic modeling and species-sampling, where its power-law properties lead to improved performance over traditional Dirichlet-based methods.
The Pitman–Yor Chinese Restaurant Process (PYCRP) is a cornerstone model in Bayesian nonparametric statistics for constructing random partitions and discrete random probability measures exhibiting power-law behavior. It generalizes the Dirichlet Process by introducing a second parameter to control the clustering structure and the frequency distribution of clusters. The PYCRP underpins a range of hierarchical models, especially in topic modeling and species-sampling problems, and supports efficient inference algorithms through its exchangeable partition structure and stick-breaking construction (Lim et al., 2016, Franssen et al., 2022, Lawless et al., 2018, Pereira et al., 2018, Arbel et al., 2018, Canale et al., 2019).
1. Two-Parameter Pitman–Yor Process and CRP Representation
The Pitman–Yor process, denoted for discount parameter and concentration parameter , defines an almost surely discrete random probability measure on a space with base distribution . Its constructive stick-breaking representation involves i.i.d. atoms and associated weights:
so that has the law (Lawless et al., 2018, Canale et al., 2019).
The Chinese Restaurant Process (CRP) analogy interprets sample draws from as customers entering a restaurant: existing tables represent unique observed values ("clusters"). For customers seated at tables of sizes , the next customer sits:
- At table with probability ,
- At a new table with probability .
As , the PYCRP reduces to the Dirichlet Process CRP with additive seating probability proportional to current table sizes alone (Lim et al., 2016, Lawless et al., 2018).
2. Partition Distribution and EPPF
The PYCRP induces exchangeable random partitions of characterized by the Exchangeable Partition Probability Function (EPPF). For a partition into blocks of sizes :
with the rising factorials and . For , this recovers the Dirichlet process EPPF (Lim et al., 2016, Franssen et al., 2022, Lawless et al., 2018, Canale et al., 2019).
This EPPF encapsulates the full probabilistic law over compositions of a sample into clusters, facilitating marginalization in inference tasks and providing direct access to species-sampling properties.
3. Power-Law and Partition Growth Properties
A defining property of the PYCRP is its control over cluster frequency distributions:
- The expected number of clusters in a sample of size is for large , displaying polynomial (power-law) growth for (Pereira et al., 2018, Franssen et al., 2022, Canale et al., 2019).
- The distribution of cluster sizes exhibits Zipf's law: larger clusters are rarer, and the number of distinct clusters of given size decays polynomially.
Table counts and individual cluster size counts concentrate sharply around deterministic curves with fluctuation scale and rates provided in nonasymptotic results, controlling convergence and random variation (Pereira et al., 2018). The power-law regime matches empirical data in applications (e.g., word frequencies in language, species abundance, network structures) more accurately than the Dirichlet Process model (Lim et al., 2016).
4. Hierarchical and Franchise Extensions
In multi-level models such as nonparametric Bayesian topic models, multiple Pitman–Yor processes are hierarchically coupled—an arrangement known as the Chinese Restaurant Franchise (CRF). For instance, documents draw their own topic distributions from document-level PYPs, which themselves share statistical strength via a corpus-level PYP. Words are drawn from topic distributions, which may also be PYPs over vocabulary.
In this construction, restaurant table counts at children nodes serve as customers at parent nodes. Marginalizing over the random probabilities yields a tractable Markovian structure on the hierarchy of counts and tables, making inference scalable (Lim et al., 2016).
5. Inference Algorithms and Posterior Computation
PYCRP-based models support collapsed Gibbs sampling by exploiting the exchangeable nature of the process. For each data point:
- Remove its seating assignment, decrementing counts (with random removal of empty tables).
- Compute conditional predictive probabilities for assigning clusters (tables), accounting for both old and new tables as per the PYCRP rules.
- Update the counts recursively along the franchise, recomputing likelihood ratios using modularized functions and Stirling-number calculations.
Efficient implementation can exploit Stirling number caching, gamma–beta augmentation for marginalizing over , and blocked or auxiliary variable sampling for hierarchical structures (Lim et al., 2016).
The Importance Conditional Sampling (ICS) algorithm is specifically tailored to the PYCRP structure, using posterior Dirichlet decomposition for cluster probabilities and importance-resampled auxiliary draws from the PY predictive for proposing new clusters. ICS achieves stable mixing and bounded cost per iteration, in contrast to slice or truncation-based samplers, which degrade with large discount parameter (Canale et al., 2019).
6. Parameter Estimation and Asymptotic Theory
The primary parameters (or equivalently ) govern power-law behavior and clustering. Empirical Bayes and full-Bayes procedures for parameter estimation are developed via maximization or Bayesian integration over the partition likelihood (EPPF):
- The marginal MLE (empirical Bayes) for solves the likelihood maximization problem derived from sample partitions, with explicit formulas for the log-likelihood and its derivatives (Franssen et al., 2022).
- Posterior contraction for around its estimator occurs at rate (up to slow variation); a Bernstein–von Mises limit theorem shows asymptotic normality of the estimator for .
- The precision parameter is treated via profile likelihood or prior augmentation; in forensic contexts, plug-in and Bayes estimates for match probabilities admit normal fluctuation limits under the PYCRP model (Franssen et al., 2022).
7. Applications and Empirical Performance
Hierarchical PYCRP models are extensively deployed in latent variable modeling for natural language (e.g., topic modeling for text corpora) and mixture modeling in clustering. In empirical studies on large-scale text data (e.g., Twitter), hierarchical Pitman–Yor models outperform Dirichlet-based baselines across multiple metrics:
- Lower held-out perplexity for text
- More accurate modeling of words, hashtags, and author networks
- Improved performance in downstream clustering (measured by purity and NMI) and topic labeling
These gains are attributed to the accurate power-law modeling of rare types and robust handling of heavy-tailed observed data (Lim et al., 2016). Inclusion of shared-vocabulary PYPs (e.g., for hashtags) and network-level priors (e.g., via Gaussian processes) further enhances modeling capacity and empirical fit.
Summary Table: Core Aspects of the Pitman–Yor Chinese Restaurant Process
| Feature | Mathematical Characterization | Key Papers |
|---|---|---|
| Predictive rule | , | (Lim et al., 2016, Lawless et al., 2018, Canale et al., 2019) |
| Partition distribution | (Lim et al., 2016, Lawless et al., 2018, Canale et al., 2019) | |
| Expected # clusters | (Pereira et al., 2018, Franssen et al., 2022, Canale et al., 2019) | |
| Inference schemes | Collapsed Gibbs, ICS, stick-breaking truncation | (Lim et al., 2016, Canale et al., 2019, Arbel et al., 2018) |
| Application highlight | Topic models, power-law species/words, networked data | (Lim et al., 2016, Franssen et al., 2022) |
The PYCRP is thus established as a flexible and analytically tractable extension of the Dirichlet process CRP, enforcing power-law cluster growth and supporting efficient, exact inference and parameter estimation in a broad array of nonparametric Bayesian models.