Power-Modulated Dirichlet Processes

Updated 27 May 2026

Power-modulated Dirichlet processes are Bayesian nonparametric models that modify the traditional rich-get-richer effect by raising cluster counts to a power r.
They exhibit distinct behaviors where r<1 balances clusters with power-law growth and r>1 concentrates allocation on fewer, larger clusters.
These models extend to dynamic frameworks like Dirichlet–Hawkes processes and infinite-dimensional diffusions, enabling flexible inference through methods such as Gibbs sampling.

A power-modulated Dirichlet process is a generalization of the Dirichlet process (DP) in which the canonical "rich-get-richer" prior on cluster sizes is modified by raising the cluster sizes or allocation propensities to a positive power, thereby continuously tuning the degree of reinforcement, or even counteracting it. Power modulation also appears as a regularization principle in Hawkes-process-driven topic models and in infinite-dimensional diffusion models whose parameters are modulated by secondary control processes. This entry surveys the mathematical structure, theoretical properties, methodology, and applied domains of power-modulated Dirichlet processes and related models.

1. Mathematical Definition and Construction

Let $G \sim \mathrm{DP}(\alpha, G_0)$ denote a standard Dirichlet process on a measurable space $(\Theta, \mathcal{A})$ . Its constructive representation via stick-breaking is

$G = \sum_{k=1}^\infty \beta_k \delta_{\theta_k},$

with $\theta_k \sim G_0$ i.i.d., $\beta_k = v_k \prod_{\ell<k} (1-v_\ell)$ and $v_k \sim \mathrm{Beta}(1, \alpha)$ . For clustering, the allocation of observation $i$ to cluster $c$ under the Chinese Restaurant Process (CRP) prior is proportional to the current cluster size:

$P(z_i=c \mid z_{1:i-1}) = \begin{cases} \frac{N_{c,-i}}{N-1+\alpha}, & c=1,\ldots,K \ \frac{\alpha}{N-1+\alpha}, & c=K+1 \end{cases}$

where $N_{c,-i}$ is the count excluding $(\Theta, \mathcal{A})$ 0.

The power-modulated Dirichlet process—also termed the Powered Dirichlet Process (PDP) or described via the powered Chinese Restaurant Process (pCRP)—modifies this by raising cluster counts to a power $(\Theta, \mathcal{A})$ 1:

$(\Theta, \mathcal{A})$ 2

Setting $(\Theta, \mathcal{A})$ 3 recovers the original DP/CRP. For $(\Theta, \mathcal{A})$ 4, reinforcement is diminished; for $(\Theta, \mathcal{A})$ 5, allocation to large clusters is further amplified (Poux-Médard et al., 2021, Lu et al., 2018). Power modulation has also been extended to Hawkes-process-driven models and infinite-dimensional diffusion contexts where the "weights" or intensities themselves are power-modulated or controlled by auxiliary processes.

2. Theoretical Properties and Effects of Power Modulation

The effect of the power $(\Theta, \mathcal{A})$ 6 on partition and cluster-size growth is dramatic:

$(\Theta, \mathcal{A})$ 7 (standard DP): Expected number of clusters $(\Theta, \mathcal{A})$ 8. Allocation probability is linear in cluster size.
$(\Theta, \mathcal{A})$ 9: Growth of the sum $G = \sum_{k=1}^\infty \beta_k \delta_{\theta_k},$ 0 is sublinear in $G = \sum_{k=1}^\infty \beta_k \delta_{\theta_k},$ 1; the expected number of clusters grows as a power law $G = \sum_{k=1}^\infty \beta_k \delta_{\theta_k},$ 2. This leads to more balanced cluster sizes and mitigates over-concentration (Poux-Médard et al., 2021).
$G = \sum_{k=1}^\infty \beta_k \delta_{\theta_k},$ 3: Sum grows superlinearly, new clusters are suppressed, and eventual allocation becomes concentrated on a finite set of clusters: $G = \sum_{k=1}^\infty \beta_k \delta_{\theta_k},$ 4 as $G = \sum_{k=1}^\infty \beta_k \delta_{\theta_k},$ 5 (Poux-Médard et al., 2021). In practice, spurious singleton clusters are strongly penalized (Lu et al., 2018).

A key structural consequence is that the power-modulated process is not infinitely exchangeable: the partition prior or seating sequence depends on the order and history of past allocations. However, this introduces a "feedback" effect: formation of large clusters in the early stage discourages creation of new ones later, counteracting over-clustering in large-sample regimes (Lu et al., 2018).

Comparison to the Pitman–Yor process (PYP): the PYP modifies the DP by a "discount" parameter $G = \sum_{k=1}^\infty \beta_k \delta_{\theta_k},$ 6, shifting allocation as $G = \sum_{k=1}^\infty \beta_k \delta_{\theta_k},$ 7. The PYP yields a power-law growth in the number of clusters $G = \sum_{k=1}^\infty \beta_k \delta_{\theta_k},$ 8. The PDP/pCRP with $G = \sum_{k=1}^\infty \beta_k \delta_{\theta_k},$ 9 achieves a similar phenomenology but via power weighting rather than discounting, resulting in distinct partition laws (Poux-Médard et al., 2021).

3. Power-Modulated Dirichlet–Hawkes Processes and Mutual Excitation

Temporal topic models, such as the Dirichlet–Hawkes process (DHP), leverage the CRP/DP and augment allocation probability with Hawkes process intensities. Specifically, for event $\theta_k \sim G_0$ 0 at time $\theta_k \sim G_0$ 1, cluster assignment is proportional to Hawkes intensity $\theta_k \sim G_0$ 2:

$\theta_k \sim G_0$ 3

A power-modulated extension raises these intensities to power $\theta_k \sim G_0$ 4:

$\theta_k \sim G_0$ 5

Motivated by Poux-Médard et al. (Poux-Médard et al., 2022), this mechanism interpolates between flattening ( $\theta_k \sim G_0$ 6) and sharpening ( $\theta_k \sim G_0$ 7) allocation probability over topics.

The multivariate powered Dirichlet–Hawkes process (MPDHP) advances the framework by allowing mutually-exciting interactions between topics, not merely self-excitation. That is, a publication about one topic can influence future allocations to another. This necessitates the use of a multivariate Hawkes intensity matrix $\theta_k \sim G_0$ 8, encoding both self- and cross-exciting dynamics, all of which are modulated by the power parameter $\theta_k \sim G_0$ 9 (Poux-Médard et al., 2022). The MPDHP is designed to capture realistic, entangled publication dynamics where topics can stimulate each other's evolution (e.g., finance affecting politics).

4. Inference, Algorithms, and Implementation

Inference under power-modulated DP models typically employs collapsed Gibbs sampling as in standard DP mixture models, but replaces the canonical CRP predictive probabilities with their powered analogs. For each step:

Remove data point $\beta_k = v_k \prod_{\ell<k} (1-v_\ell)$ 0 from its current cluster, update counts.
For each existing cluster: $\beta_k = v_k \prod_{\ell<k} (1-v_\ell)$ 1.
For a new cluster: $\beta_k = v_k \prod_{\ell<k} (1-v_\ell)$ 2.
Normalize and sample according to $\beta_k = v_k \prod_{\ell<k} (1-v_\ell)$ 3 (Lu et al., 2018, Poux-Médard et al., 2021).

In Hawkes-process-driven models, the predictive step depends on intensity values, which are updated recursively as events arrive (Poux-Médard et al., 2022). In MPDHP, mutual-excitation necessitates maintaining and updating an interaction matrix recording past event influences. Significant care is required in efficient normalization and recomputation of powered intensities as cluster structures evolve.

The computational cost is typically $\beta_k = v_k \prod_{\ell<k} (1-v_\ell)$ 4 per full Gibbs sweep, comparable to DP mixtures, with additional bookkeeping for power terms or interaction matrices.

5. Empirical Behavior, Applications, and Model Selection

Empirical studies consistently indicate that power modulation can correct the over-clustering tendency of the standard DP in large-sample modes. On MNIST subsets and the Old Faithful geyser data, pCRP recovers the true number of clusters more accurately, with cross-validation selecting $\beta_k = v_k \prod_{\ell<k} (1-v_\ell)$ 5 in the range $\beta_k = v_k \prod_{\ell<k} (1-v_\ell)$ 6– $\beta_k = v_k \prod_{\ell<k} (1-v_\ell)$ 7, depending on dataset characteristics (Lu et al., 2018). The average and maximum posterior cluster counts are significantly reduced, and allocation is more interpretable.

In synthetic data with balanced clusters, setting $\beta_k = v_k \prod_{\ell<k} (1-v_\ell)$ 8 rectifies over-concentration in a few large clusters by the standard DP, yielding higher adjusted Rand Index. Conversely, data with true heavy-tailed cluster sizes may benefit from $\beta_k = v_k \prod_{\ell<k} (1-v_\ell)$ 9 (Poux-Médard et al., 2021).

Tuning $v_k \sim \mathrm{Beta}(1, \alpha)$ 0 is generally performed via cross-validation, searching over a grid and using out-of-sample predictive loss or clustering metrics to select an inflection ("elbow") point (Lu et al., 2018). This choice is robust across sample sizes due to the proportional invariance property.

Power-modulated Dirichlet processes are now applied in Bayesian nonparametrics whenever explicit control over the cluster-size heterogeneity is desirable. Typical domains include genetic population inference (favoring $v_k \sim \mathrm{Beta}(1, \alpha)$ 1 for high-entropy structures), social block models ( $v_k \sim \mathrm{Beta}(1, \alpha)$ 2 for few large communities), and text or topic modeling under nonstandard topic distributions, particularly for short texts and temporally entangled corpora (Poux-Médard et al., 2021, Poux-Médard et al., 2022).

6. Diffusion Limits and Infinite-Dimensional Power-Modulated Extensions

In interacting particle systems, power modulation manifests in the scaling limits of inclusion processes with slow phases. The process described in "Modulated Poisson-Dirichlet diffusions arising from inclusion processes with a slow phase" generalizes the Poisson–Dirichlet (PD) diffusion by introducing a control (or slow) phase that modulates key parameters, such as the total fast-phase/condensate mass $v_k \sim \mathrm{Beta}(1, \alpha)$ 3 and the PD "mutation" drift $v_k \sim \mathrm{Beta}(1, \alpha)$ 4. This yields a joint diffusion in $v_k \sim \mathrm{Beta}(1, \alpha)$ 5, where $v_k \sim \mathrm{Beta}(1, \alpha)$ 6 evolves in the ranked mass simplex, and $v_k \sim \mathrm{Beta}(1, \alpha)$ 7 is a vector of occupancy frequencies subject to a deterministic ODE (Gabriel, 18 Jul 2025).

The generator of the joint diffusion takes the form:

$v_k \sim \mathrm{Beta}(1, \alpha)$ 8

with stochastic dynamics for the condensate and deterministic control for the slow phase. Instantaneous condensation ensures that mass exchanges between phases are effectively "power-modulated" by the control process, altering both transient and stationary distributions—an example of parameter modulation beyond the classic DP framework.

A plausible implication is that such modulated diffusions open new perspectives for nonparametric modeling of large systems with emergent mass-exchange boundaries and dynamic power-law structure.

7. Limitations and Open Questions

While providing direct and flexible control over the rich-get-richer effect, power-modulated Dirichlet processes are non-exchangeable, complicating analytic derivations and interpretations of partition priors. Exchangeable partition probability functions (EPPFs) lack simple closed forms outside the powered Dirichlet–multinomial case (Poux-Médard et al., 2021). Selection or placing a prior on the power parameter $v_k \sim \mathrm{Beta}(1, \alpha)$ 9 remains nontrivial; current practice relies on grid search and cross-validation. Theoretical results for large- $i$ 0 consistency and posterior contraction under power modulation are still underdeveloped.

Nonetheless, power-modulated Dirichlet processes represent a versatile extension of Bayesian nonparametric machinery, aligning theoretical properties with domain knowledge of cluster-size distributions and accommodating increasingly realistic dynamical generative models (Poux-Médard et al., 2022, Poux-Médard et al., 2021, Lu et al., 2018, Gabriel, 18 Jul 2025).

Markdown Report Issue Upgrade to Chat

References (4)

Powered Dirichlet Process for Controlling the Importance of "Rich-Get-Richer" Prior Assumptions in Bayesian Clustering (2021)

Reducing over-clustering via the powered Chinese restaurant process (2018)

Multivariate Powered Dirichlet Hawkes Process (2022)

Modulated Poisson-Dirichlet diffusions arising from inclusion processes with a slow phase (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Power-Modulated Dirichlet Processes.

Power-Modulated Dirichlet Processes

1. Mathematical Definition and Construction

2. Theoretical Properties and Effects of Power Modulation

3. Power-Modulated Dirichlet–Hawkes Processes and Mutual Excitation

4. Inference, Algorithms, and Implementation

5. Empirical Behavior, Applications, and Model Selection

6. Diffusion Limits and Infinite-Dimensional Power-Modulated Extensions

7. Limitations and Open Questions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Power-Modulated Dirichlet Processes

1. Mathematical Definition and Construction

2. Theoretical Properties and Effects of Power Modulation

3. Power-Modulated Dirichlet–Hawkes Processes and Mutual Excitation

4. Inference, Algorithms, and Implementation

5. Empirical Behavior, Applications, and Model Selection

6. Diffusion Limits and Infinite-Dimensional Power-Modulated Extensions

7. Limitations and Open Questions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research