A Multivariate Unimodality Test Harnessing the Dip Statistic of Mahalanobis Distances Over Random Projections
Abstract: Unimodality, pivotal in statistical analysis, offers insights into dataset structures and drives sophisticated analytical procedures. While unimodality's confirmation is straightforward for one-dimensional data using methods like Silverman's approach and Hartigans' dip statistic, its generalization to higher dimensions remains challenging. By extrapolating one-dimensional unimodality principles to multi-dimensional spaces through linear random projections and leveraging point-to-point distancing, our method, rooted in $\alpha$-unimodality assumptions, presents a novel multivariate unimodality test named mud-pod. Both theoretical and empirical studies confirm the efficacy of our method in unimodality assessment of multidimensional datasets as well as in estimating the number of clusters.
- On low distortion embeddings of statistical distance measures into low dimensional spaces. In Bhowmick, S. S., Küng, J., and Wagner, R. (eds.), Database and Expert Systems Applications, pp. 164–172, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg. ISBN 978-3-642-03573-9.
- Density-based clustering based on hierarchical density estimates. In Pei, J., Tseng, V. S., Cao, L., Motoda, H., and Xu, G. (eds.), Advances in Knowledge Discovery and Data Mining, pp. 160–172, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg. ISBN 978-3-642-37456-2.
- The uu-test for statistical modeling of unimodal data. Pattern Recognition, 122:108272, 2022. ISSN 0031-3203. doi: https://doi.org/10.1016/j.patcog.2021.108272. URL https://www.sciencedirect.com/science/article/pii/S0031320321004520.
- Dai, T. On multivariate unimodal distributions, 1989. URL https://open.library.ubc.ca/collections/ubctheses/831/items/1.0097413.
- Dasgupta, S. Learning mixtures of gaussians. 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039), pp. 634–644, 1999. URL https://api.semanticscholar.org/CorpusID:8338511.
- Testing k-Modal Distributions: Optimal Algorithms via Reductions, pp. 1833–1852. 2013. doi: 10.1137/1.9781611973105.131. URL https://epubs.siam.org/doi/abs/10.1137/1.9781611973105.131.
- Learning k𝑘kitalic_k-modal distributions via testing. Theory of Computing, 10(20):535–570, 2014. doi: 10.4086/toc.2014.v010a020. URL https://theoryofcomputing.org/articles/v010a020.
- Unimodality, convexity, and applications. Elsevier, 1988.
- Asymptotics of Graphical Projection Pursuit. The Annals of Statistics, 12(3):793 – 815, 1984. doi: 10.1214/aos/1176346703. URL https://doi.org/10.1214/aos/1176346703.
- On low-dimensional projections of high-dimensional distributions. In From Probability to Statistics and Back: High-Dimensional Models and Processes–A Festschrift in Honor of Jon A. Wellner, volume 9, pp. 91–105. Institute of Mathematical Statistics, 2013.
- Universal inference meets random projections: a scalable test for log-concavity. arXiv preprint arXiv:2111.09254, 2021.
- Pg-means: learning the number of clusters in data. In Schölkopf, B., Platt, J., and Hoffman, T. (eds.), Advances in Neural Information Processing Systems, volume 19. MIT Press, 2006. URL https://proceedings.neurips.cc/paper_files/paper/2006/file/a9986cb066812f440bc2bb6e3c13696c-Paper.pdf.
- Fernandez-Granda, C. Random projections and compressed sensing. Lecture Notes in Optimization-based Data Analysis, 2016. URL https://cims.nyu.edu/~cfgranda/pages/OBDA_spring16/material/random_projections.pdf. New York University.
- The Dip Test of Unimodality. The Annals of Statistics, 13(1):70 – 84, 1985. doi: 10.1214/aos/1176346577. URL https://doi.org/10.1214/aos/1176346577.
- Hotelling, H. The Generalization of Student’s Ratio. The Annals of Mathematical Statistics, 2(3):360 – 378, 1931. doi: 10.1214/aoms/1177732979. URL https://doi.org/10.1214/aoms/1177732979.
- Hull, J. J. Database for handwritten text recognition research. In IEEE Transactions on pattern analysis and machine intelligence, volume 16, pp. 550–554. IEEE, 1994.
- Extensions of lipschitz mappings into hilbert space. Contemporary mathematics, 26:189–206, 1984.
- Dip-means: an incremental clustering method for estimating the number of clusters. In Pereira, F., Burges, C., Bottou, L., and Weinberger, K. (eds.), Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012. URL https://proceedings.neurips.cc/paper_files/paper/2012/file/a8240cb8235e9c493a0c30607586166c-Paper.pdf.
- The uci machine learning repository, 2023. URL https://archive.ics.uci.edu/ml. Last accessed: October 31, 2023.
- Khintchine, A. Y. On unimodal distributions. Izv. Nsuchno-Issled. Inst. Mat. Meh. Tomsk. Goa. Univ., 2:1–7, 1938. in Russian.
- Konstantellos, A. Unimodality conditions for gaussian sums. IEEE Transactions on Automatic Control, 25(4):838–839, 1980. doi: 10.1109/TAC.1980.1102410.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
- Testing statistical hypotheses. Springer Texts in Statistics. Springer, New York, third edition, 2005. ISBN 0-387-98864-5.
- A more powerful two-sample test in high dimensions using random projection. In Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., and Weinberger, K. (eds.), Advances in Neural Information Processing Systems, volume 24. Curran Associates, Inc., 2011. URL https://proceedings.neurips.cc/paper_files/paper/2011/file/5487315b1286f907165907aa8fc96619-Paper.pdf.
- hdbscan: Hierarchical density based clustering. The Journal of Open Source Software, 2(11):205, 2017.
- Novikov, A. PyClustering: Data mining library. Journal of Open Source Software, 4(36):1230, apr 2019. doi: 10.21105/joss.01230. URL https://doi.org/10.21105/joss.01230.
- A generalized unimodality. Journal of Applied Probability, 7(1):21–34, 1970. ISSN 00219002. URL http://www.jstor.org/stable/3212145.
- X-means: Extending k-means with efficient estimation of the number of clusters. In Proceedings of the Seventeenth International Conference on Machine Learning, ICML ’00, pp. 727–734, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc. ISBN 1558607072.
- Raptt: An exact two-sample test in high dimensions using random projections. Journal of Computational and Graphical Statistics, 25(3):954–970, 2016. doi: 10.1080/10618600.2015.1062771. URL https://doi.org/10.1080/10618600.2015.1062771.
- Schubert, E. Stop using the elbow criterion for k-means and how to choose the number of clusters instead. SIGKDD Explor. Newsl., 25(1):36–42, jul 2023. ISSN 1931-0145. doi: 10.1145/3606274.3606278. URL https://doi.org/10.1145/3606274.3606278.
- Sharipov, O. S. Glivenko-Cantelli Theorems, pp. 612–614. Springer Berlin Heidelberg, Berlin, Heidelberg, 2011. ISBN 978-3-642-04898-2. doi: 10.1007/978-3-642-04898-2˙280. URL https://doi.org/10.1007/978-3-642-04898-2_280.
- Are your data gathered? In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’18, pp. 2210–2218, New York, NY, USA, 2018. Association for Computing Machinery. ISBN 9781450355520. doi: 10.1145/3219819.3219994. URL https://doi.org/10.1145/3219819.3219994.
- Silverman, B. W. Using kernel density estimates to investigate multimodality. Journal of the Royal Statistical Society. Series B (Methodological), 43(1):97–99, 1981. ISSN 00359246. URL http://www.jstor.org/stable/2985156.
- Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.