Dendrogram of mixing measures: Hierarchical clustering and model selection for finite mixture models (2403.01684v2)
Abstract: We present a new way to summarize and select mixture models via the hierarchical clustering tree (dendrogram) constructed from an overfitted latent mixing measure. Our proposed method bridges agglomerative hierarchical clustering and mixture modeling. The dendrogram's construction is derived from the theory of convergence of the mixing measures, and as a result, we can both consistently select the true number of mixing components and obtain the pointwise optimal convergence rate for parameter estimation from the tree, even when the model parameters are only weakly identifiable. In theory, it explicates the choice of the optimal number of clusters in hierarchical clustering. In practice, the dendrogram reveals more information on the hierarchy of subpopulations compared to traditional ways of summarizing mixture models. Several simulation studies are carried out to support our theory. We also illustrate the methodology with an application to single-cell RNA sequence analysis.
- Identifiability of nonparametric mixture models and bayes optimal clustering. The Annals of Statistics.
- Finite mixture models do not reliably learn the number of components. In International Conference on Machine Learning, pages 1158–1169. PMLR.
- Hypothesis test for normal mixture models: The em approach. The Annals of Statistics.
- Chen, J. H. (1995). Optimal rate of convergence for finite mixture models. Annals of Statistics, 23(1):221–233.
- Bayesian clustering via fusing of localized densities. arXiv preprint arXiv:2304.00074.
- A probabilistic theory of clustering. Pattern Recognition, 37(5):917–925.
- About the posterior distribution in hidden markov models with unknown number of states. Bernoulli.
- On posterior contraction of parameters and interpretability in bayesian mixture modeling. Bernoulli, 27(4):2159–2188.
- Hartigan, J. A. (1977). Distribution problems in clustering. In Classification and clustering, pages 45–71. Elsevier.
- Hartigan, J. A. (1985). Statistical theory in clustering. Journal of classification, 2:63–76.
- Strong identifiability and optimal minimax rates for finite mixture estimation. Annals of Statistics, 46(6A):2844–2870.
- Convergence rates of parameter estimation for some weakly identifiable finite mixtures. Annals of Statistics, 44:2726–2755.
- On strong identifiability and convergence rates of parameter estimation in finite mixtures. Electronic Journal of Statistics, 10:271–307.
- Singularity structures and impacts on parameter estimation in finite mixtures of distributions. SIAM Journal on Mathematics of Data Science, 1(4):730–758.
- Robust estimation of mixing measures in finite mixture models. Bernoulli.
- Consistent estimation of mixture complexity. The Annals of Statistics, 29(5):1281–1296.
- Keener, R. W. (2010). Theoretical statistics: Topics for a core course. Springer.
- Testing the order of a finite mixture. Journal of the American Statistical Association, 105(491):1084–1092.
- Asymptotics for likelihood ratio tests under loss of identifiability. The Annals of Statistics, 31(3):807–832.
- Estimating the number of components in finite mixture models via the group-sort-fuse procedure. The Annals of Statistics, 49(6):3043–3069.
- Refined convergence rates for maximum likelihood estimation under finite mixture models. In International Conference on Machine Learning, pages 14979–15006. PMLR.
- Mixture models with a prior on the number of components. Journal of the American Statistical Association, 113(521):340–356.
- Nguyen, X. (2013). Convergence of latent mixing measures in finite and infinite mixture models. The Annals of Statistics, 41(1):370–400.
- Nguyen, X. (2015). Posterior contraction of the population polytope in finite admixture models. Bernoulli, 21(1):618 – 646.
- Nguyen, X. (2016). Borrowing strengh in hierarchical bayes: Posterior concentration of the dirichlet base measure. Bernoulli.
- Pearson, K. (1894). Contributions to the mathematical theory of evolution. Philosophical Transactions of the Royal Society of London. A, 185:71–110.
- On bayesian analysis of mixtures with an unknown number of components (with discussion). Journal of the Royal Statistical Society Series B: Statistical Methodology, 59(4):731–792.
- Schwarz, G. (1978). Estimating the dimension of a model. The annals of statistics, pages 461–464.
- Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2):411–423.
- van de Geer, S. (2000). Empirical Processes in M-estimation. Cambridge University Press.
- Villani, C. (2009). Optimal transport: old and new, volume 338. Springer.
- Towards a statistical theory of clustering. In Pascal workshop on statistics and optimization of clustering, pages 20–26. London, UK.
- Minimum ϕitalic-ϕ\phiitalic_ϕ-distance estimators for finite mixing measures. arXiv preprint arXiv:2304.10052.
- Massively parallel digital transcriptional profiling of single cells. Nature communications, 8(1):14049.
- S. van de Geer. Empirical Processes in M-estimation. Cambridge University Press, 2000
- Robert W Keener. Theoretical statistics: Topics for a core course. Springer, 2010.