Joint Distance Covariance (JdCov)

Updated 11 September 2025

Joint Distance Covariance (JdCov) is a nonparametric measure that quantifies mutual dependence among d ≥ 2 random vectors by comparing their joint and marginal characteristic functions.
It extends bivariate distance covariance to assess full joint independence in multivariate and high-dimensional data using centered distance matrices and U-statistics.
JdCov underpins advances in causal inference and fairness-aware machine learning, inspiring scalable algorithms and kernel-based extensions for complex data settings.

Joint Distance Covariance (JdCov) is a nonparametric, universal measure of mutual dependence among $d \geq 2$ random vectors, generalizing distance covariance from the classical bivariate setting to arbitrary higher-order settings. Its defining feature is the ability to nonparametrically test for mutual independence—returning zero if and only if the vectors are independent—extending the distance covariance paradigm beyond pairwise interactions to full joint structure. JdCov and related multivariate extensions have rapidly gained traction in mathematical statistics, high-dimensional analysis, multivariate causal inference, and algorithmic fairness, due to their strong theoretical properties and practical performance in a wide range of inferential tasks.

1. Definition and Theoretical Foundations

Let $X^1, \ldots, X^d$ be $d$ random vectors, with each $X^k \in \mathbb{R}^{p_k}$ . The JdCov is fundamentally defined via the $L^2$ -distance between the joint characteristic function and the product of marginal characteristic functions: $\operatorname{JdCov}^2(X^1, \ldots, X^d) = \int \left| f_{X^1, \ldots, X^d}(t_1, \ldots, t_d) - \prod_{k=1}^d f_{X^k}(t_k) \right|^2 \rho(dt_1, \ldots, dt_d)$ where $f_{X^1, \ldots, X^d}$ is the joint characteristic function, $f_{X^k}$ are the marginals, and $\rho$ is a product measure (often involving powers of the Euclidean norm with dimension-matched normalizations as in the original theory).

JdCov equals zero if and only if $X^1, \dots, X^d$ are mutually independent under appropriate moment and metric conditions. Generalizations using metrics or semimetrics of strong negative type (in the sense of Lyons (Lyons, 2011)) allow definition and independence characterization in general metric and separable Hilbert spaces.

Equivalent representations are possible in terms of expectations of appropriately centered multivariate distances or via U-statistics aggregating products of pairwise or $d$ -wise distances.

2. Multivariate and Metric Space Extensions

JdCov's theoretical guarantees and construction extend naturally to random objects taking values in arbitrary metric spaces or (possibly infinite-dimensional) separable Hilbert spaces. Lyons (Lyons, 2011) proved that for a test to be consistent against all alternatives, the underlying metric spaces must be of strong negative type. This property is satisfied by separable Hilbert spaces, vastly broadening the applicability of JdCov to functional data, time series, and high-dimensional structured data.

Mathematically, JdCov is often implemented via centered distance matrices: $d_\mu(x, x') = d(x, x') - \alpha_\mu(x) - \alpha_\mu(x') + D(\mu),$ with centering removing location effects, and $D(\mu)$ and $\alpha_\mu(x)$ being expected distances with respect to the measure $\mu$ . For the joint measure, the distance covariance formulation is recast as an expectation of products of these adjusted distances over independent copies of the random vectors.

3. Algorithms and Practical Computation

Empirical estimators of JdCov typically involve quadratic forms of double-centered distance matrices. For $n$ samples, pairwise distance matrices for each variable are centered column- and row-wise, then combined via traces or sums. For moderate $d$ and $n$ , explicit computation is tractable (e.g., $O(n^2)$ for standard implementations).

Recent algorithmic advances enable more scalable computation. For standard distance covariance, exact $O(n\log n)$ univariate algorithms have been proposed (Chaudhuri et al., 2018). For JdCov, while no $O(n\log n)$ general algorithm yet exists for the fully multivariate case, strategies involving random projections, subset aggregation, or specialized handling of high-dimensional marginals (e.g., via k-d trees or projection-aggregation (Chakraborty et al., 2017, Chaudhuri et al., 2018)) are active areas of development.

Empirical JdCov accommodates mixed data types by employing alternative metric choices, such as the Minkowski metric for $p \in [1, 2]$ , and is compatible with regularization and sub-sampling for computationally intensive settings.

4. Asymptotic Theory, Estimation, and Applications

Under suitable regularity and moment conditions, asymptotic properties of the empirical JdCov statistics—including consistency and the distribution of test statistics under the null and alternative—have been established (Chakraborty et al., 2017, Matteson et al., 2013). Bootstrap and permutation procedures are typically used for inference, especially for tests of joint independence where limiting distributions may depend on unknown dependencies among components.

In independent component analysis (ICA), JdCov was shown to enable extraction and validation of mutually independent components in nonparametric latent variable models (Matteson et al., 2013). Empirical JdCov-based estimators provided both consistency and improved robustness compared to sequential or correlation-based ICA approaches.

JdCov has also been adopted in causal inference for model selection, specifically for testing independence of residuals in structural equation models (Chakraborty et al., 2017), as well as in fairness-aware machine learning, where JdCov regularization minimizes statistical dependence between model predictions and vectors of protected attributes, effectively addressing "fairness gerrymandering" across intersectional subgroups (Lee et al., 9 Sep 2025).

5. High-dimensional, Kernel, and Visualization Aspects

In high-dimensional settings, the classical "joint" JdCov can exhibit power loss—degenerating to sensitivity predominantly to linear (second-order) associations—when each variable is high-dimensional and the sample size small relative to dimension (Zhu et al., 2019). Remedies include aggregating marginal or low-rank componentwise dependence measures to recover sensitivity to nonlinear relationships. Kernel-based variants (such as the Hilbert-Schmidt Independence Criterion) admit similar representations via distances in reproducing kernel Hilbert spaces and have analogous high-dimensional limitations.

Recent work (Wang et al., 2023) introduced the Additive Decomposition of Correlations (ADC) formula, showing that JdCov can be expressed as a weighted sum of squared correlations between kernel-induced latent features: $\operatorname{JdCov}^2 = \sum_{i_1, \ldots, i_d} \lambda_{i_1} \cdots \lambda_{i_d} \operatorname{corr}^2\left( \phi_{i_1}^1(X^1), \ldots, \phi_{i_d}^d(X^d) \right)$ where $\lambda_{i_k}$ are kernel eigenvalues and $\phi_{i_k}^k$ are eigenfunctions/features for the $k$ -th variable. This decomposition facilitates visualization and interpretability, allowing identification of the dominant features driving joint dependence.

6. Extensions and Limitations

The JdCov framework is extensible to complex data domains, including time series (via lagged empirical characteristic functions and Hilbert space embeddings (Betken et al., 2021)), manifold-valued data, and situations with only partial or conditional independence hypotheses (conditional JdCov).

A limitation is that in very high-dimensional settings, joint estimators may lack sensitivity to higher-order nonlinear effects. Marginal aggregation, selective regularization, or kernel adaptation can partially address this issue (Zhu et al., 2019, Xie et al., 2022). Another computational challenge is algorithmic scalability for massive data or large $d$ , motivating further development of fast approximate algorithms (Chakraborty et al., 2017).

The theoretical foundation remains well-established. For standard implementations, statistical power, universality (i.e., the "if and only if" independence characterization), and moment/invariance properties all mirror the univariate and bivariate versions, provided strong negative type of the underlying metric.

7. Comparative Analysis and Impact

JdCov fundamentally differs from traditional dependence metrics by being nonparametric, sensitive to arbitrary forms of dependence, and universally consistent for joint independence, assuming suitable metrics (Lyons, 2011, Janson, 2019). It resolves limitations of classical covariance/correlation, such as equating zero with independence only in specific parametric settings, and directly generalizes the bivariate Székely-Rizzo-Bakirov distance covariance (Edelmann et al., 2022).

Comparisons with alternative joint testing frameworks (e.g., kernel-based, copula-based, or latent variable-based approaches) show that JdCov yields competitive or higher power—especially when joint nonlinear or higher-order dependencies are present among the random vectors involved. It provides a rigorous inferential foundation for nonparametric joint independence testing, independent component analysis, causal structure discovery, and equitable machine learning, with flexible metrics that support broad application domains.

In summary, Joint Distance Covariance (JdCov) provides a rigorously formulated, metric-driven, nonparametric measure of mutual dependence for more than two random vectors. It is grounded in the mathematics of characteristic functions, general metric or Hilbert space theory, and U-statistics, and is manifestly suited for use in modern multivariate analysis, high-dimensional statistics, data science, and algorithmic fairness. JdCov retains the universality and sensitivity of its bivariate antecedent and supports extensions, computation, and interpretation for a new generation of independence testing and dependence modeling problems in mathematical statistics and applied sciences.