Epsilon-Delta Analysis of Chatterjee's Rank Correlation
- The paper introduces an epsilon-delta framework that quantifies the stability and sensitivity of Chatterjee’s rank correlation under perturbations, providing tight contamination bounds (e.g., a 0.012 shift for 1% contamination).
- The method utilizes asymptotic expected sensitivity functions and local L1 residuals to unify rank-based and moment-based dependence measures, thereby enhancing nonparametric inference.
- The analysis establishes explicit ε–δ equivalences at independence and perfect dependence, ensuring robustness and continuity even under weak convergence and local distributional changes.
Chatterjee’s rank correlation, denoted , is a nonparametric functional measuring the degree of association between random variables and that attains 1 for perfect functional dependence, 0 under independence, and interpolates smoothly in between. The – interpretation of this coefficient offers a rigorous quantification of its stability, sensitivity, and continuity under perturbations of the joint distribution—whether by gross contamination, weak convergence, or local dependence—providing tight analytical bounds for inference and robustness.
1. Fundamental Definitions and Forms
Chatterjee’s sample rank correlation, for i.i.d. data with ties in resolved appropriately, is defined as
where are the ranks of after sorting the data by (Chatterjee, 2019). The population version, critical for asymptotic and contamination analysis, is
where and .
Chatterjee’s coefficient admits formulations via local residuals, Markov-product copulas, and conditional variances, which underlie the various – analyses (Sato, 13 Dec 2025, Ansari et al., 14 Mar 2025).
2. Sensitivity and Robustness via Asymptotic Expected Sensitivity Function
The primary device for – robustness is the Asymptotic Expected Sensitivity Function (AESF), defined for a functional as
where is the empirical plug-in estimator. For Chatterjee’s , if the supremum , then under -contamination
one obtains the first-order contamination bound
and more conservatively for all ,
when the functional is Lipschitz in total variation (Zhang, 2024). This bound is tight: the worst case occurs when concentrates mass at the point where is maximized.
For example, in a linear-Gaussian case with , a numerical value yields, for , a maximal shift under 1% contamination.
3. – Structure for Functional Dependence and Independence
Chatterjee’s coefficient exhibits explicit – equivalences at the endpoints:
- Functional dependence: If a.s., then . Conversely, if , is almost surely a function of . Finite deviation from noiseless dependence, , yields for , where
- Independence: if and only if and are independent. For uniform distance from independence , one gets . Conversely, if , then .
These bounds justify the interpretation of as a calibrated, Lipschitz-quantified “distance” from both perfect dependence and independence, with explicit – parameters.
4. Continuity: Markov Products and Weak Convergence
Continuity properties of in the weak topology deviate from classical rank correlations. Chatterjee’s is not continuous with respect to weak convergence of joint laws, but instead with respect to the law of Markov products where is conditionally independent given and (Ansari et al., 14 Mar 2025).
Theorem (ε–δ continuity of ): For continuous and suitable range convergence,
where is the Prokhorov distance. Copula-based representations yield bounds such as when the uniform norm (Ansari et al., 14 Mar 2025).
This ensures that small perturbations in the conditional law of , as measured in the appropriate metric (not simply the joint law), produce arbitrarily small effects on , with explicit – quantification.
5. Primitive Local – Construction and Empirical Structure
A local – perspective frames Chatterjee’s as the limiting residual of a local averaging scheme:
- For with , (probability-integral transforms), define for the empirical -neighborhood of as .
- The local average of near is .
- The mean local residual is .
- In the limit, converges to ; Chatterjee’s correlation emerges as
matching the original rank-difference formula (Sato, 13 Dec 2025).
All – operations (local sets, residuals) are invariant under monotone transformations; the probability-integral transform serves only to achieve distribution-freeness.
6. Moment-Based Analogues and Unified Framework
Replacement of the local residual with analogues links Chatterjee’s to familiar moment-based indices:
For jointly Gaussian , one recovers Pearson’s through this construction, showing the – approach unifies rank-based and moment-based dependence measures under a single limiting framework (Sato, 13 Dec 2025).
7. Assumptions, Limitations, and Practical Implications
Rigorous – control relies on continuity in , regularity of , and Hadamard differentiability of . The main theoretical limits—tightness of the contamination bound and sharpness of independence/functionality bounds—are achieved under these hypotheses (Zhang, 2024, Chatterjee, 2019). In finite samples, contamination and sampling errors are additive, with the former scaling as and the latter as .
A plausible implication is that for statistical inference and robust estimation, Chatterjee’s offers explicit, interpretable robustness margins, with – quantification superior to earlier rank-based coefficients where such fine-grained control is unavailable or only asymptotically valid. The local – interpretation remains central for applications in dependence quantification, goodness-of-fit, and model diagnostics.