Determine practical benefits of k-simplicial distances for real-world clustering

Determine whether the k-simplicial distance, a generalization of the Mahalanobis distance parameterized by k ∈ {1,…,d}, provides practical benefits for cluster analysis on real-life datasets when used within the K-means algorithm, beyond the positive results reported on simulated data.

Background

The report reviews the 2018 study introducing k-simplicial distances, which compared Euclidean, Mahalanobis, and k-simplicial distances for clustering on simulated data and reported improvements under certain conditions, such as lower intrinsic dimensionality. However, that study did not conduct comparative evaluations on real-world datasets, instead applying only k-simplicial distances to empirical data.

Because no real-world comparative results were presented, the report highlights that it cannot draw conclusions about the practical effectiveness of k-simplicial distances in applied clustering tasks, motivating the need for empirical validations across real-life datasets.

References

However, the paper does not compare between the different distance measures for an application to a real-life dataset, choosing instead to only use the k-simplicial distances for this purpose. This means that we cannot conclude from the paper whether this k-simplicial distance is beneficial in practice.

An Investigation into Distance Measures in Cluster Analysis  (2404.13664 - Shapcott, 2024) in Section 3.1 (Discussion from Papers)