Clustering Tails in High Dimension

Published 24 Jun 2025 in stat.ME | (2506.19414v1)

Abstract: One potential solution to combat the scarcity of tail observations in extreme value analysis is to integrate information from multiple datasets sharing similar tail properties, for instance, a common extreme value index. In other words, for a multivariate dataset, we intend to group dimensions into clusters first, before applying any pooling techniques. This paper addresses the clustering problem for a high dimensional dataset, according to their extreme value indices. We propose an iterative clustering procedure that sequentially partitions the variables into groups, ordered from the heaviest-tailed to the lightesttailed distributions. At each step, our method identifies and extracts a group of variables that share the highest extreme value index among the remaining ones. This approach differs fundamentally from conventional clustering methods such as using pre-estimated extreme value indices in a two-step clustering method. We show the consistency property of the proposed algorithm and demonstrate its finite-sample performance using a simulation study and a real data application.