2000 character limit reached
Clustering Three-Way Data with Outliers (2310.05288v3)
Published 8 Oct 2023 in stat.ML and cs.LG
Abstract: Matrix-variate distributions are a recent addition to the model-based clustering field, thereby making it possible to analyze data in matrix form with complex structure such as images and time series. Due to its recent appearance, there is limited literature on matrix-variate data, with even less on dealing with outliers in these models. An approach for clustering matrix-variate normal data with outliers is discussed. The approach, which uses the distribution of subset log-likelihoods, extends the OCLUST algorithm to matrix-variate normal data and uses an iterative approach to detect and trim outliers.
- Covariance pattern mixture models for the analysis of multivariate heterogeneous longitudinal data. Ann. Appl. Stat. 9(2), 777–800.
- The multivariate leptokurtic-normal distribution and its application in model-based clustering. Canadian Journal of Statistics 45(1), 95–119.
- Banfield, J. D. and A. E. Raftery (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3), 803–821.
- Discrete data clustering using finite mixture models. Pattern Recognition 42(1), 33–42.
- Clark, K. M. and P. D. McNicholas (2022). oclust: Gaussian Model-Based Clustering with Outliers. R package version 0.2.0.
- Clark, K. M. and P. D. McNicholas (2023). Using subset log-likelihoods to trim outliers in gaussian mixture models. arXiv preprint arXiv:1907.01136v4.
- Trimmed k𝑘kitalic_k-means: an attempt to robustify quantizers. The Annals of Statistics 25(2), 553–576.
- Mixtures of multivariate power exponential distributions. Biometrics 71(4), 1081–1089.
- Model-based clustering and classification using mixtures of multivariate skewed power exponential distributions. Journal of Classification 40(1), 145–167.
- DasGupta, A. (2008). Asymptotic theory of statistics and probability. Springer Science & Business Media.
- Outlier identification in model-based cluster analysis. Journal of classification 32(1), 63.
- Mixtures of shifted asymmetric Laplace distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(6), 1149–1157.
- Gallaugher, M. P. and P. D. McNicholas (2017). A matrix variate skew-t distribution. Stat 6(1), 160–170.
- Gallaugher, M. P. B. and P. D. McNicholas (2018). Finite mixtures of skewed matrix variate distributions. Pattern Recognition 80, 83 – 93.
- A general trimming approach to robust cluster analysis. The Annals of Statistics 36(3), 1324–1345.
- Gupta, A. K. and D. K. Nagar (1999). Matrix Variate Distributions. Boca Raton: Chapman & Hall/CRC Press.
- Joe, H. (2006). Generating random correlation matrices based on partial correlations. Journal of Multivariate Analysis 97(10), 2177–2189.
- On stochastic limit and order relationships. The Annals of Mathematical Statistics 14(3), 217–226.
- McNicholas, P. D. (2016). Model-based clustering. Journal of Classification 33(3), 331–373.
- Transformation mixture modeling for skewed data groups with heavy tails and scatter. Computational Statistics 36(1), 61–78.
- Asymmetric clusters and outliers: Mixtures of multivariate contaminated shifted asymmetric laplace distributions. Computational Statistics & Data Analysis 132, 145–166. Special Issue on Biostatistics.
- Naderi, M. and M. J. Nooghabi (2024). Clustering asymmetrical data with outliers: Parsimonious mixtures of contaminated mean-mixture of normal distributions. Journal of Computational and Applied Mathematics 437, 115433.
- Peel, D. and G. J. McLachlan (2000). Robust mixture modelling using the t distribution. Statistics and Computing 10(4), 339–348.
- Visual assessment of matrix-variate normality. Australian & New Zealand Journal of Statistics.
- High-dimensional unsupervised classification via parsimonious contaminated mixtures. Pattern Recognition 98, 107031.
- ContaminatedMixt: An R package for fitting parsimonious mixtures of multivariate contaminated normal distributions. Journal of Statistical Software 85(10), 1–25.
- Parsimonious mixtures of multivariate contaminated normal distributions. Biometrical Journal 58(6), 1506–1537.
- clusterGeneration: Random Cluster Generation (with Specified Degree of Separation). R package version 1.3.7.
- mclust 5: Clustering, classification and density estimation using Gaussian finite mixture models. The R Journal 8(1), 205–233.
- Finite mixtures of matrix variate Poisson-log normal distributions for three-way count data. Bioinformatics 39(5), btad167.
- Subedi, S. and R. P. Browne (2020). A family of parsimonious mixtures of multivariate Poisson-lognormal distributions for clustering multivariate count data. Stat 9(1), e310. e310 sta4.310.
- Robust mixture clustering using Pearson type VII distribution. Pattern Recognition Letters 31, 2447–2454.
- Model-based clustering via new parsimonious mixtures of heavy-tailed distributions. AStA Advances in Statistical Analysis, 1–33.
- MatrixMixtures: Model-Based Clustering via Matrix-Variate Mixture Models. R package version 1.0.0.
- Mixtures of matrix-variate contaminated normal distributions. Journal of Computational and Graphical Statistics 31(2), 413–421.
- Two new matrix-variate distributions with application in model-based clustering. Computational Statistics & Data Analysis 152, 107050.
- Gaussian mixture modeling by exploiting the Mahalanobis distance. IEEE Transactions on Signal Processing 56(7), 2797–2811.
- Viroli, C. (2011). Finite mixtures of matrix normal distributions for classifying three-way data. Statistics and computing 21(4), 511–522.