Chinese Restaurant Process (CRP)

Updated 30 November 2025

Chinese Restaurant Process (CRP) is a probabilistic partition model that assigns customers to tables based on existing counts and a concentration parameter.
It underpins Bayesian nonparametric methods by connecting to Dirichlet and Pitman–Yor processes, enabling flexible clustering and mixture modeling.
Extensions such as generalized, distance-dependent, and continuous-time CRPs broaden its applications to hierarchical models, spatial data, and dynamic inference.

The Chinese Restaurant Process (CRP) is a family of probabilistic models for generating random partitions, central to Bayesian nonparametrics and closely related to the Dirichlet and Poisson–Dirichlet processes. Its combinatorial and process-theoretic representations underpin inference in mixture models, species sampling, genetics, and a variety of hierarchical Bayesian procedures. Extensions such as the generalized CRP, distance-dependent CRP, and continuous-time CRP further link to combinatorics, stochastic processes, and scaling limits.

1. Generative Construction and Partition Laws

The classical CRP is described via an exchangeable sequential partition model. Consider $N$ customers entering a restaurant with infinite tables. The $i$ -th customer chooses to sit at an existing table $k$ with probability proportional to $n_k$ , the current occupancy, or at a new table with probability proportional to a concentration parameter $\alpha>0$ : $P(z_i = k \mid z_{1:i-1}) = \begin{cases} \frac{n_k}{\alpha + (i-1)} & \text{for occupied } k, \ \frac{\alpha}{\alpha + (i-1)} & \text{for a new table}. \end{cases}$ The joint probability of a seating arrangement $(z_1, ..., z_N)$ , or equivalently a partition into $K_+$ blocks with sizes $n_1, ..., n_{K_+}$ , is

$P_{\rm CRP}(z_{1:N}) = \frac{\alpha^{K_+} \prod_{k=1}^{K_+}(n_k-1)!}{\prod_{i=0}^{N-1}(\alpha+i)}.$

This law aligns with the Ewens sampling formula, fundamental to both genetics and the theory of exchangeable partitions (Li, 2015, Buntine et al., 2010).

2. Poisson–Dirichlet and Generalized Chinese Restaurant Process

The two-parameter CRP, or Pitman–Yor process, introduces a discount $0\le \alpha < 1$ and concentration $\theta > -\alpha$ (Buntine et al., 2010, Lawless et al., 2018). The seating probabilities generalize to

$P(\text{sit at } k) = \frac{n_k - \alpha}{n + \theta}, \quad P(\text{new table}) = \frac{\theta+K\alpha}{n+\theta}.$

The corresponding exchangeable partition probability function (EPPF) for a partition into $K$ blocks is

$p(n_1, ..., n_K \mid \alpha, \theta) = \frac{(\theta | \alpha)_K}{(\theta)_N} \prod_{k=1}^K (1-\alpha)_{n_k-1},$

where $(x)_m$ and $(\theta|\alpha)_K$ denote rising factorials. Pitman–Yor CRPs induce partitions whose ranked block sizes converge to Poisson–Dirichlet limits, and $K_n$ , the number of clusters, grows as $n^\alpha$ for large $n$ (Pereira et al., 2018, Galganov et al., 8 Feb 2025). Explicit non-asymptotic concentration bounds quantify deviations in cluster counts for finite samples (Pereira et al., 2018).

3. Connections to Dirichlet and Stick-Breaking Processes

The CRP is mathematically equivalent to drawing allocation indices from a Dirichlet or Pitman–Yor process whose stick-breaking construction is: $\pi_1 = V_1, \quad \pi_k = V_k \prod_{i=1}^{k-1}(1-V_i), \quad V_i \sim \mathrm{Beta}(1-\alpha, \theta + i\alpha).$ Sampling assignment $z_i$ iid from $(\pi_k)$ , the induced random partition matches the CRP law. For $\alpha=0$ (the Dirichlet process), the assignment law and stick-breaking weights coincide exactly with the classical CRP (Miller, 2018, Lawless et al., 2018). These equivalences are constructive and avoid measure-theoretic arguments, providing the theoretical basis for fast Gibbs and variational inference algorithms (Lawless et al., 2018).

4. Extensions: Tree, Distance, and Continuous-Time CRPs

Tree-structured CRP (treeCRP): Clusters are arranged as nodes in a rooted tree. The seating process is modified to admit new customers either to existing nodes or as a child of a randomly chosen parent, yielding nonparametric tree priors for phylogeny and lineage inference in biological data (Deshwar et al., 2014).

Distance-Dependent CRP (dd-CRP): Exchangeability is relaxed, and each datum links to another by a decay function $f(d_{ij})$ of distance or temporal difference, or self-links with mass $\alpha$ . Clusters become connected components of the induced graph, allowing modeling of temporally-coherent or spatially-structured data streams (Krishnan et al., 2016, Figueroa et al., 2017).

Continuous-Time and Up–Down CRP: Point processes and Markov chains extend the CRP to continuous time, e.g., to model random permutations or queueing networks. Notably, the up–down ordered CRP (oCRP) evolves by customer arrival and departure, realized as a composition-valued Markov chain with generator: $Lf(n) = \sum_{i=1}^k (n_i-\alpha)[f(n+e_i) - f(n)] + \cdots + \sum_{i=1}^k n_i[f(n-e_i) - f(n)],$ where $e_i$ increments the $i$ -th table size. Ray–Knight type results show these processes are equivalent to the skewer of a marked spectrally positive Lévy process, and scaling limits yield interval partition diffusions with Poisson–Dirichlet stationary laws (Rogers et al., 2020).

5. Model Selection, Clustering, and Inference Algorithms

In Bayesian mixture models, the CRP prior serves as a nonparametric regularizer, allowing the number of clusters to grow as needed. For finite-component models, the Factorized Information Criterion (FIC) can be interpreted as a prior over latent configurations, with $P_{\text{FIC}}(\Z|K) \propto \prod_{n_k>0} n_k^{-D_c/2}$ . At $D_c = 2$ , FIC is equivalent to CRP, but for $D_c > 2$ , FIC maintains stronger regularization, while the CRP's impact typically weakens and is overwhelmed by likelihood terms (Li, 2015). Generalizations interpolate between FIC and CRP, e.g., $P_{\text{GFIC}} \propto \prod_{n_k>0} n_k^{-d}$ for $d\in (1, D_c/2)$ .

Quantum annealing methods reformulate CRP inference as a stochastic optimization of the log-EPPF energy landscape, using parallel replicas and QA schedules to find higher-probability clusterings, particularly in network data (Sato et al., 2013).

6. Limit Theorems, Scaling Behaviors, and Functional Limits

The CRP's block-counts, especially of small clusters, exhibit functional scaling limits. For fixed small sizes, block-appearance times rescaled by $n$ converge to Poisson random measures with intensity $\mu^{(N)}(dx_1 \cdots dx_N dy) = \theta y^{-(N+1)} dx_1 \cdots dx_N dy$ (Galganov et al., 8 Feb 2025). The joint process of block counts $\{C_k(P_{\lfloor n t\rfloor})\}_{t\in[1,T]}$ converges to time-inhomogeneous Markov chains. For large blocks, the ranked size frequencies converge to Poisson–Dirichlet distributions. These results bridge combinatorial, probabilistic, and queuing perspectives on CRP dynamics (Gnedin et al., 2022, Galganov et al., 8 Feb 2025).

7. Hierarchical and Nonparametric Bayesian Applications

Hierarchical Bayesian models frequently employ CRP-based priors, notably in the Chinese Restaurant Franchise for LLMs (e.g., hierarchical Pitman–Yor processes for n-gram modeling), allowing partial conjugacy and multi-level fragmentation and clustering (Buntine et al., 2010). In lifelong learning and reinforcement learning, CRP priors enable dynamic task-cluster expansion: each new task chooses among existing clusters based on their assignment counts, or spawns a new cluster proportional to a concentration parameter, driving adaptive network growth and transfer behavior (Wang et al., 2022).

Table: CRP Variants and Their Probabilistic Features

Model	Seating Rule / Prior	Limit Behavior / Stationary Law
One-parameter CRP	$n_k/(\theta+n)$ , $\theta/(\theta+n)$	Ewens sampling formula, PD( $\theta$ )
Two-parameter CRP	$(n_k-\alpha)/(\theta+n)$	PD( $\alpha,\theta$ ), $\sim n^{\alpha}$ clusters
Distance-dependent CRP	$f(d_{ij})/Z_i$ , $\alpha/Z_i$	Non-exchangeable, graph-induced clusters
Up–down ordered CRP (CTMC)	Generator with $(n_k-\alpha)$ , $\alpha$ , $\theta$	Interval-partition diffusions, PD stationary
Tree-CRP	Tree-structured assignment rules	Phylogenetic tree priors

In summary, the Chinese Restaurant Process is a foundational probabilistic model for generating exchangeable partitions. Its generalizations, scaling limits, and algorithmic adaptations constitute essential tools in modern probabilistic modeling, statistical genetics, nonparametric clustering, and Bayesian hierarchical inference. The CRP's deep connections to stick-breaking processes, Poisson–Dirichlet laws, combinatorial limit theorems, and Markov process representations substantiate its central role in both theoretical and applied probability (Buntine et al., 2010, Galganov et al., 8 Feb 2025, Rogers et al., 2020).