Azadkia–Chatterjee Coefficient
- The Azadkia–Chatterjee coefficient is a nonparametric, graph-based statistic that measures the strength of dependence between random variables through variance decompositions.
- It is defined using nearest-neighbor graphs, ensuring consistency, invariance under transformations, and adaptation to the intrinsic manifold of the data.
- Extensions include multivariate and conditional versions with computational efficiency and asymptotic guarantees such as normality and rate-adaptivity.
The Azadkia–Chatterjee coefficient is a fully nonparametric, graph-based statistic for measuring the strength of statistical dependence between random variables or vectors, admitting formal properties that align with Rényi’s axiomatic framework for dependence measures. Defined using nearest-neighbor graphs, the coefficient is consistent, adapts to the intrinsic dimension of the underlying data manifold, possesses key invariances, and allows explicit computations for its asymptotic properties. Originally developed to address the quantification of independence and functional dependence, it occupies a distinctive position among modern dependence and conditional independence measures, with active extensions to multivariate, conditional, and scale-invariant contexts.
1. Population Definition and Sample Statistic
Let be a pair of random variables where and , both with continuous marginals. The Azadkia–Chatterjee coefficient, denoted , is defined at the population level by
where is the distribution function of (Han et al., 2022).
Given i.i.d. samples , the empirical statistic is: with the rank of among and the index of the nearest neighbor of among .
Alternative forms utilize the empirical CDF: where normalizes to the interval (Han et al., 2022, Lin et al., 2022).
2. Key Properties and Interpretation
- Range: , with if and only if , and if and only if is almost surely a measurable function of .
- Axiomatic Consistency: Satisfies Rényi’s criteria for dependence measures (Han et al., 2022).
- Invariance:
- Invariant under strictly increasing transformations of and any measure-preserving transformations of (e.g., bi-Lipschitz maps).
- Model-Free: No parametric assumptions on the joint law of (Han et al., 2022, Deb et al., 2020).
- Strong Consistency: As , almost surely under continuity assumptions.
- Extremal Behavior: Interpolates precisely between independence and perfect functional dependence.
- Functional Target: The statistic operationalizes the Dette–Siburg–Stoimenov measure (Shi et al., 2020).
3. Asymptotic Theory and Adaptivity
- Central Limit Theorem: Under , , with dependent only on the intrinsic dimension of the support manifold of (Han et al., 2022). For supported on an -dimensional submanifold ,
where and are defined in terms of geometric probabilities related to mutual and shared nearest-neighbor structures.
- Adaptivity: The convergence rate of to is dictated by the intrinsic manifold dimension , not the ambient ; specifically, under suitable regularity,
- Limiting Variance: Under independence, , where is a function only of .
- Proof Techniques: CLT and variance expressions are established using Hájek projection and local normal approximation for statistics indexed by nearest-neighbor graphs, controlling graph-dependent covariance terms (Lin et al., 2022, Han et al., 2022).
4. Relationships to Other Measures of Dependence
- Comparison to Classical Rank Correlations: The Azadkia–Chatterjee coefficient is not a measure of concordance (unlike Kendall’s or Spearman’s ) but rather quantifies functional dependency. For stochastically increasing (or decreasing) copulas, one has , and for certain copulas, the gap is maximized at $0.4$ (Ansari et al., 18 Jun 2025).
- Statistical Efficiency: While appealing due to asymptotic normality and computational tractability, the coefficient's local power is rate suboptimal compared to Hoeffding's , Blum-Kiefer-Rosenblatt's , or Bergsma-Dassios-Yanagimoto's for detecting subtle local departures from independence (Shi et al., 2020).
- Representation in RKHS and Optimal Transport Frameworks: The coefficient is a limit case of general kernel-based measures of association and coincides with a special case (distance kernel or minimum kernel) in the RKHS–optimal transport framework (Deb et al., 2020, Deb et al., 2024).
- Continuity Properties: Unlike classical concordance measures, lacks plain weak continuity, but is weakly continuous in the Markov-product topology—i.e., under convergence in conditional distributions, with sufficient copula- and equicontinuity-based criteria (Ansari et al., 14 Mar 2025).
5. Extensions and Algorithmic Variants
- Multivariate and Conditional Versions: Extensions exist for measuring dependence between vector-valued () responses and predictors based on appropriate product measures and nearest-neighbor graphs. Conditional dependence versions target , quantifying the residual association between and given (Ansari et al., 2022, Huang et al., 8 Dec 2025).
- Rank-based Scale-Invariant Versions: To address lack of scale invariance of Euclidean nearest-neighbor graphs in higher dimensions, rank-based nearest-neighbor graphs (Rosenbaum graphs) are proposed, yielding a variant of the coefficient invariant under strictly increasing transformations in each feature, with comparable limit theory (Tran et al., 2024).
- Computational Complexity: The original and rank-based versions can be evaluated in time for fixed dimensions, typically using efficient nearest-neighbor or ranking algorithms (Lin et al., 2022, Tran et al., 2024).
- Variance Estimation: Explicit data-driven and plug-in estimators are available for asymptotic variance, facilitating inference and hypothesis testing (Lin et al., 2022).
6. Limitations, Power, and Practical Guidance
- Sensitivity to Functional Dependence: The coefficient is highly sensitive to functional forms of dependency— whenever is a measurable function of —but does not have optimal power against local alternatives (of order ) in independence testing (Shi et al., 2020).
- Finite-Sample and Distribution-Free Properties: When based on empirical (optimal transport) ranks, tests based on the coefficient can be made exactly distribution-free under the null hypothesis (Deb et al., 2020, Deb et al., 2024).
- Guidance for Use: The coefficient is recommended for applications focused on detecting strong, possibly nonlinear or non-monotonic, functional dependency, or for manifold-structured data. For tests seeking optimal power against subtle or local alternatives, classical U-statistics may be preferred.
7. Connections to Conditional Independence Testing and Graphical Models
- Conditional Independence: The conditional variant measures the reduction in variance of a conditional probability upon introducing additional conditioning variables, and fits into modern frameworks for conditional independence testing (e.g., CRT-based methods) (Shi et al., 2021).
- Graphical Model Construction: The coefficient has been used as the edge-weight criterion in nonparametric graphical model structure learning, leveraging its characterization of conditional independence for pairwise conditional relationships (Furmańczyk, 2023).
The Azadkia–Chatterjee coefficient, and its various algorithmic and theoretical extensions, constitute a robust and interpretable framework for quantifying the strength of dependency—functional and otherwise—between random variables, grounding their population properties in variance decompositions, and enabling practical, scalable, and manifold-adaptive estimation for modern statistical and machine learning applications (Han et al., 2022, Lin et al., 2022, Tran et al., 2024, Ansari et al., 2022, Huang et al., 8 Dec 2025).