Papers
Topics
Authors
Recent
2000 character limit reached

Epsilon-Delta Analysis of Chatterjee's Rank Correlation

Updated 16 December 2025
  • The paper introduces an epsilon-delta framework that quantifies the stability and sensitivity of Chatterjee’s rank correlation under perturbations, providing tight contamination bounds (e.g., a 0.012 shift for 1% contamination).
  • The method utilizes asymptotic expected sensitivity functions and local L1 residuals to unify rank-based and moment-based dependence measures, thereby enhancing nonparametric inference.
  • The analysis establishes explicit ε–δ equivalences at independence and perfect dependence, ensuring robustness and continuity even under weak convergence and local distributional changes.

Chatterjee’s rank correlation, denoted ξ(X,Y)\xi(X, Y), is a nonparametric functional measuring the degree of association between random variables XX and YY that attains 1 for perfect functional dependence, 0 under independence, and interpolates smoothly in between. The ε\varepsilonδ\delta interpretation of this coefficient offers a rigorous quantification of its stability, sensitivity, and continuity under perturbations of the joint distribution—whether by gross contamination, weak convergence, or local dependence—providing tight analytical bounds for inference and robustness.

1. Fundamental Definitions and Forms

Chatterjee’s sample rank correlation, for i.i.d. data (Xi,Yi)(X_i, Y_i) with ties in XX resolved appropriately, is defined as

ξn(X,Y)=13n21i=1n1ri+1ri,\xi_n(X, Y) = 1 - \frac{3}{n^2-1}\sum_{i=1}^{n-1}|r_{i+1} - r_i|,

where rir_i are the ranks of Y(i)Y_{(i)} after sorting the data by XX (Chatterjee, 2019). The population version, critical for asymptotic and contamination analysis, is

ξ(X,Y)=RVar(GX(t))dμ(t)RG(t)(1G(t))dμ(t),\xi(X, Y) = \frac{\int_{\mathbb{R}}\mathrm{Var}(G_X(t))\,d\mu(t)}{\int_{\mathbb{R}}G(t)(1-G(t))\,d\mu(t)},

where GX(t)=P(YtX)G_X(t) = \mathbb{P}(Y\geq t|X) and G(t)=P(Yt)G(t) = \mathbb{P}(Y\geq t).

Chatterjee’s coefficient admits formulations via local L1L^1 residuals, Markov-product copulas, and conditional variances, which underlie the various ε\varepsilonδ\delta analyses (Sato, 13 Dec 2025, Ansari et al., 14 Mar 2025).

2. Sensitivity and Robustness via Asymptotic Expected Sensitivity Function

The primary device for ε\varepsilonδ\delta robustness is the Asymptotic Expected Sensitivity Function (AESF), defined for a functional RR as

AESF(z;R,F)=limn(n+1)EF[Rn+1(X1,,Xn,z)Rn(X1,,Xn)],\mathrm{AESF}(z; R, F) = \lim_{n\to\infty} (n+1)\,\mathbb{E}_{F}\left[R_{n+1} (X_1,\dots,X_n, z) - R_n(X_1,\dots,X_n)\right],

where RnR_n is the empirical plug-in estimator. For Chatterjee’s ξ\xi, if the supremum M=sup(x,y)R2AESF((x,y);ξ,F)<M = \sup_{(x, y)\in\mathbb{R}^2} |\mathrm{AESF}((x,y); \xi, F)| < \infty, then under ε\varepsilon-contamination

Fε=(1ε)F+εH,F_\varepsilon = (1-\varepsilon)F + \varepsilon H,

one obtains the first-order contamination bound

ξ(Fε)ξ(F)εM+o(ε),|\xi(F_\varepsilon) - \xi(F)| \leq \varepsilon M + o(\varepsilon),

and more conservatively for all ε[0,1)\varepsilon \in [0,1),

ξ(Fε)ξ(F)εM1ε|\xi(F_\varepsilon) - \xi(F)| \leq \frac{\varepsilon M}{1-\varepsilon}

when the functional is Lipschitz in total variation (Zhang, 2024). This bound is tight: the worst case occurs when HH concentrates mass at the point where AESF|\mathrm{AESF}| is maximized.

For example, in a linear-Gaussian case with ρ=0.7\rho=0.7, a numerical value M0.71.15M_{0.7} \approx 1.15 yields, for ε=0.01\varepsilon=0.01, a maximal shift ξ(Fε)ξ(F)0.012|\xi(F_\varepsilon)-\xi(F)| \leq 0.012 under 1% contamination.

3. ε\varepsilonδ\delta Structure for Functional Dependence and Independence

Chatterjee’s coefficient exhibits explicit ε\varepsilonδ\delta equivalences at the endpoints:

  • Functional dependence: If Y=f(X)Y = f(X) a.s., then ξ=1\xi=1. Conversely, if ξ=1\xi=1, YY is almost surely a function of XX. Finite deviation from noiseless dependence, Δ=P(Yf(X))<δ\Delta = \mathbb{P}(Y \neq f(X)) < \delta, yields 1ξ<ϵ1-\xi < \epsilon for δ=ϵD\delta=\epsilon D, where

D=RG(t)(1G(t))dμ(t)D = \int_\mathbb{R} G(t)(1-G(t))\,d\mu(t)

(Chatterjee, 2019).

  • Independence: ξ=0\xi=0 if and only if XX and YY are independent. For uniform distance from independence α<δ\alpha < \delta, one gets ξ<δ2/D\xi < \delta^2/D. Conversely, if ξ<ϵ\xi < \epsilon, then α<ϵD\alpha < \sqrt{\epsilon D}.

These bounds justify the interpretation of ξ\xi as a calibrated, Lipschitz-quantified “distance” from both perfect dependence and independence, with explicit ε\varepsilonδ\delta parameters.

4. Continuity: Markov Products and Weak Convergence

Continuity properties of ξ\xi in the weak topology deviate from classical rank correlations. Chatterjee’s ξ\xi is not continuous with respect to weak convergence of joint laws, but instead with respect to the law of Markov products (Y,Y)(Y, Y') where YY' is conditionally independent given XX and YXYXY'|X \sim Y|X (Ansari et al., 14 Mar 2025).

Theorem (ε–δ continuity of ξ\xi): For FYnF_{Y_n} continuous and suitable range convergence,

dP(Law(Yn,Yn),Law(Y,Y))<δ    ξ(Yn,Xn)ξ(Y,X)<ϵ,d_P(\mathrm{Law}(Y_n,Y_n'), \mathrm{Law}(Y,Y')) < \delta \implies |\xi(Y_n, X_n) - \xi(Y, X)| < \epsilon,

where dPd_P is the Prokhorov distance. Copula-based representations yield bounds such as ξ(Yn,Xn)ξ(Y,X)6δ|\xi(Y_n, X_n) - \xi(Y, X)| \le 6\delta when the uniform norm CYn,YnCY,Y<δ\|C_{Y_n,Y_n'}-C_{Y,Y'}\|_\infty < \delta (Ansari et al., 14 Mar 2025).

This ensures that small perturbations in the conditional law of YXY|X, as measured in the appropriate metric (not simply the joint law), produce arbitrarily small effects on ξ\xi, with explicit ε\varepsilonδ\delta quantification.

5. Primitive Local ε\varepsilonδ\delta Construction and Empirical Structure

A local ε\varepsilonδ\delta perspective frames Chatterjee’s ξ\xi as the limiting residual of a local averaging scheme:

  • For (U,V)(U, V) with U=FX(X)U = F_X(X), V=FY(Y)V = F_Y(Y) (probability-integral transforms), define for ε>0\varepsilon > 0 the empirical ε\varepsilon-neighborhood of UiU_i as Nε(i)={j:UjUiε}\mathcal{N}_\varepsilon(i) = \{j: |U_j-U_i|\leq \varepsilon\}.
  • The local average of VV near UiU_i is Vˉi(ε)=1Nε(i)jNε(i)Vj\bar{V}_i(\varepsilon) = \frac{1}{|\mathcal{N}_\varepsilon(i)|}\sum_{j\in\mathcal{N}_\varepsilon(i)} V_j.
  • The mean local L1L^1 residual is ζn(ε)=1ni=1nVˉi(ε)Vi\zeta_n(\varepsilon) = \frac{1}{n}\sum_{i=1}^n |\bar{V}_i(\varepsilon) - V_i|.
  • In the ε0\varepsilon \to 0 limit, ζn(ε)\zeta_n(\varepsilon) converges to EVE[VU]\mathbb{E}|V - \mathbb{E}[V|U]|; Chatterjee’s correlation emerges as

ξn=1ζnEVEV=14ζn,\xi_n = 1 - \frac{\zeta_n}{\mathbb{E}|V - \mathbb{E} V|} = 1-4\zeta_n,

matching the original rank-difference formula (Sato, 13 Dec 2025).

All ε\varepsilonδ\delta operations (local sets, residuals) are invariant under monotone transformations; the probability-integral transform serves only to achieve distribution-freeness.

6. Moment-Based Analogues and Unified Framework

Replacement of the local L1L^1 residual with L2L^2 analogues links Chatterjee’s ξ\xi to familiar moment-based indices:

  • ζ(2)=E[(VE[VU])2]=E[Var(VU)]\zeta^{(2)} = \mathbb{E}[(V - \mathbb{E}[V|U])^2] = \mathbb{E}[\mathrm{Var}(V|U)]
  • η(2)=1E[Var(VU)]Var(V)=Var(E[VU])Var(V)\eta^{(2)} = 1 - \frac{\mathbb{E}[\mathrm{Var}(V|U)]}{\mathrm{Var}(V)} = \frac{\mathrm{Var}(\mathbb{E}[V|U])}{\mathrm{Var}(V)}

For jointly Gaussian (U,V)(U, V), one recovers Pearson’s R2R^2 through this construction, showing the ε\varepsilonδ\delta approach unifies rank-based and moment-based dependence measures under a single limiting framework (Sato, 13 Dec 2025).

7. Assumptions, Limitations, and Practical Implications

Rigorous ε\varepsilonδ\delta control relies on continuity in YY, regularity of FF, and Hadamard differentiability of ξ\xi. The main theoretical limits—tightness of the contamination bound and sharpness of independence/functionality bounds—are achieved under these hypotheses (Zhang, 2024, Chatterjee, 2019). In finite samples, contamination and sampling errors are additive, with the former scaling as εM\varepsilon M and the latter as Op(1/n)O_p(1/\sqrt{n}).

A plausible implication is that for statistical inference and robust estimation, Chatterjee’s ξ\xi offers explicit, interpretable robustness margins, with ε\varepsilonδ\delta quantification superior to earlier rank-based coefficients where such fine-grained control is unavailable or only asymptotically valid. The local ε\varepsilonδ\delta interpretation remains central for applications in dependence quantification, goodness-of-fit, and model diagnostics.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Epsilon-Delta Interpretation of Chatterjee's Rank Correlation.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube