A Model-Based Clustering Approach for Bounded Data Using Transformation-Based Gaussian Mixture Models (2412.13572v2)
Abstract: The clustering of bounded data presents unique challenges in statistical analysis due to the constraints imposed on the data values. This paper introduces a novel method for model-based clustering specifically designed for bounded data. Building on the transformation-based approach to Gaussian mixture density estimation introduced by Scrucca (2019), we extend this framework to develop a probabilistic clustering algorithm for data with bounded support that allows for accurate clustering while respecting the natural bounds of the variables. In our proposal, a flexible range-power transformation is employed to map the data from its bounded domain to the unrestricted real space, hence enabling the estimation of Gaussian mixture models in the transformed space. Despite the close connection to density estimation, the behavior of this approach has not been previously investigated in the literature. Furthermore, we introduce a novel measure of clustering uncertainty, the Normalized Classification Entropy (NCE), which provides a general and interpretable measure of classification uncertainty. The performance of the proposed method is evaluated through real-world data applications involving both fully and partially bounded data, in both univariate and multivariate settings, showing improved cluster recovery and interpretability. Overall, the empirical results demonstrate the effectiveness and advantages of our approach over traditional and advanced model-based clustering techniques that rely on distributions with bounded support.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.