Ties, Tails and Spectra: On Rank-Based Dependency Measures in High Dimensions
(2508.14992v1)
Published 20 Aug 2025 in math.ST, math.PR, and stat.TH
Abstract: This work is concerned with the limiting spectral distribution of rank-based dependency measures in high dimensions. We provide distribution-free results for multivariate empirical versions of Kendall's $\tau$ and Spearman's $\rho$ in a setting where the dimension $p$ grows at most proportionally to the sample size $n$. Although rank-based measures are known to be well suited for discrete and heavy-tailed data, previous works in the field focused mostly on the continuous and light-tailed case. We close this gap by imposing mild assumptions and allowing for general types of distributions. Interestingly, our analysis reveals that a non-trivial adjustment of classical Kendall's $\tau$ is needed to obtain a pivotal limiting distribution in the presence of tied data. The proof for Spearman's $\rho$ is facilitated by a result regarding the limiting eigenvalue distribution of a general class of random matrices with rows on the Euclidean unit sphere, which is of independent interest. For instance, this finding can be used to derive the limiting spectral distribution of sample correlation matrices, which, in contrast to most existing works, accommodates heavy-tailed data.
Collections
Sign up for free to add this paper to one or more collections.
The paper derives limiting spectral distributions for rank-based measures (Kendall's τ and Spearman's ρ) under varying high-dimensional asymptotics.
It rigorously characterizes eigenvalue convergence to the semicircle or Marcenko-Pastur laws depending on the ratio of sample size to dimension.
The modified approaches address issues with ties and heavy tails, enhancing robustness in non-parametric statistical analysis.
Ties, Tails, and Spectra: On Rank-Based Dependency Measures in High Dimensions
Introduction
The paper "Ties, Tails and Spectra: On Rank-Based Dependency Measures in High Dimensions" (2508.14992) investigates the spectral properties of rank-based dependency measures, specifically focusing on Kendall's τ and Spearman's ρ in high-dimensional settings. The paper addresses the gap in existing literature by accommodating tied and heavy-tailed data, contrasting with prior work mainly dealing with continuous distributions. Through rigorous analysis, the authors provide a detailed characterization of the eigenvalue distributions under these conditions.
Figure 1: Normalized histograms of the simulated eigenvalues of n/p⋅ (left panel) and n/p⋅offdiag(τ) (right panel), showcasing limiting behaviors.
Eigenvalue Distributions of Dependency Measures
Preliminaries
The analysis assumes a setup where the dimension p grows at most proportionally with the sample size n. The main goal is to identify the limiting spectral distributions (LSDs) of the rank-based measures as p and n tend toward infinity. Two standard asymptotic frameworks considered are:
p/n→0, leading to a semicircle distribution.
p/n→γ>0, resulting in the Marcˇenko-Pastur distribution.
The paper thoroughly derives the LSDs for rank-based measures under these regimes, providing a unifying approach for both discrete and continuous data.
Spearman's ρ
Spearman's ρ is redefined in the context of matrix form where row vectors lie on the Euclidean unit sphere. The paper proves that for p/n→0 or p/n→γ, the LSDs under these configurations align with semicircle and Marcˇenko-Pastur laws, respectively, covering cases previously handled only for continuous data distributions.
Figure 2: Distribution shifts of Spearman's ρ across various dimensional growth scenarios.
Kendall's τ
Kendall's τ is adapted to address limitations with discrete data by using an adjusted version that yields a universal LSD. The modifications ensure robustness to ties, and results show that modulo scaling, the LSDs for Kendall's τ under similar asymptotic conditions also align with the semicircle and Marcˇenko-Pastur laws.
Figure 3: Eigenvalue distributions of adjusted Kendall's τ demonstrating convergence in probability to theoretical limits.
Statistical and Theoretical Implications
The results have notable implications for high-dimensional statistical analysis, particularly in fields such as finance, genomics, and network analysis, where non-parametric methods are increasingly prevalent due to robustness against outliers and ties.
Moreover, the theoretical findings establish a connection between random matrix theory and rank-based statistics, emphasizing the utility and limits of these dependency measures beyond traditional Gaussian assumptions. The universal nature of limiting distributions irrespective of underlying data distributions underscores this relationship.
Conclusion
The research extends the understanding of rank-based measures in high dimensions, particularly under non-standard data conditions. By providing universal results across different data types, this work enriches both theoretical insights and practical methodologies for modern data analysis, pointing towards future exploration of discrete and heavy-tailed impacts in statistical modeling and inference.
Figure 4: Histogram of diagonal entries of scaling matrix demonstrating robustness of the proposed scaling adjustments across sample sizes.