Dice Question Streamline Icon: https://streamlinehq.com

Ambiguity of empty components under symmetric Dirichlet-categorical priors

Clarify the interpretation and practical implications of empty components in mixture models using symmetric Dirichlet-categorical priors when the number of components k is not known a priori, specifically determining whether a configuration with three components, one empty, is meaningfully distinct from a configuration with two non-empty components and how such representations should be handled when estimating k.

Information Square Streamline Icon: https://streamlinehq.com

Background

Under the conventional symmetric Dirichlet-categorical prior over component assignments, component sizes can be zero, yielding empty components. When k is variable, the same partition of observations can be represented with different values of k (e.g., three components with one empty versus two non-empty), introducing ambiguity in interpretation and complicating model selection.

The authors highlight this conceptual uncertainty and address it pragmatically in their work by adopting a modified prior that forbids empty components. Nonetheless, the underlying interpretive question about the meaning and distinctness of such representations remains explicitly raised in the text.

References

There is nothing to stop $n_r$ from being zero, and indeed correct normalization requires that it must be zero some portion of the time, but it is unclear what this means. What does it mean to have three components, one of which is empty? How is that different from two non-empty components?

Fast sampling and model selection for Bayesian mixture models (2501.07668 - Newman, 13 Jan 2025) in Section 2 (Mixture models)