Papers

Topics

Authors

Recent

View all

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 79 tok/s

Gemini 2.5 Pro 30 tok/s Pro

GPT-5 Medium 29 tok/s Pro

GPT-5 High 25 tok/s Pro

GPT-4o 116 tok/s Pro

Kimi K2 191 tok/s Pro

GPT OSS 120B 468 tok/s Pro

Claude Sonnet 4 36 tok/s Pro

2000 character limit reached

Bayesian information criteria for clustering normally distributed data (2008.03974v6)

Published 10 Aug 2020 in math.ST, stat.AP, stat.ME, and stat.TH

Abstract: Maximum likelihood estimates (MLEs) are asymptotically normally distributed, and this property is used in meta-analyses to test the heterogeneity of estimates, either for a single cluster or for several sub-groups. More recently, MLEs for associations between risk factors and diseases have been hierarchically clustered to search for diseases with shared underlying causes, but an objective statistical criterion is needed to determine the number and composition of clusters. To tackle this problem, conventional statistical tests are briefly reviewed, before considering the posterior distribution for a partition of data into clusters. The posterior distribution is calculated by marginalising out the unknown cluster centres, and is different to the likelihood associated with mixture models. The calculation is equivalent to that used to obtain the Bayesian Information Criterion (BIC), but is exact, without a Laplace approximation. The result includes a sum of squares term, and terms that depend on the number and composition of clusters, that penalise the number of free parameters in the model. The usual BIC is shown to be unsuitable for clustering applications unless the number of items in each individual cluster is sufficiently large.