Papers
Topics
Authors
Recent
2000 character limit reached

Distance-Weighted Correlation Metrics

Updated 10 January 2026
  • Distance-weighted correlation metrics are dependence measures that integrate explicit metric weighting to capture nonlinear and heterogeneous relationships beyond traditional Pearson correlation.
  • They employ pairwise distance matrices, kernel-based transforms, and double-centering techniques to extend independence testing and clustering in non-Euclidean and graph-based data.
  • Applications include astrophysical classification, network analysis, and topological data analysis, offering robust tools for comparing complex structures in various domains.

A distance-weighted correlation metric is a class of dependence measures that incorporates explicit weighting by the underlying metric structure of the data, allowing the computation of correlations that reflect non-Euclidean geometry, non-uniform importance, and complex data domains. Such metrics include distance correlation, earth mover's correlation, distance-weighted Pearson correlations, and weighted rank-based metrics. These methods generalize classical correlation measures, such as Pearson's ρ\rho, to more flexible, rigorous frameworks that are sensitive to nonlinear, nonmonotonic, or heterogeneous dependencies. They find applications in independence testing, clustering, graphical model comparison, topological data analysis, and network science, among other areas.

1. Foundational Constructions: Distance Correlation and Covariance

The canonical distance correlation, due to Székely, Rizzo, and Bakirov, is formulated on the basis of the L2L_2-distance between the joint characteristic function and the product of marginals, weighted by a singular kernel. For random vectors %%%%2%%%%, YRqY \in \mathbb{R}^q, define \begin{align*} \phi_{X,Y}(t,s) &= \mathbb{E}\left[ e{i \langle t, X \rangle + i \langle s, Y \rangle} \right], \ V2(X,Y) &= \frac{1}{c_p c_q} \int_{\mathbb{R}p \times \mathbb{R}q} | \phi_{X,Y}(s, t) - \phi_X(s)\phi_Y(t)|2 \, |s|{-(p+1)} |t|{-(q+1)} ds\,dt, \end{align*} with normalizing constants cp,cqc_p, c_q, and marginals ϕX,ϕY\phi_X, \phi_Y. The distance correlation is then

R(X,Y)=V(X,Y)V(X,X)V(Y,Y),0R(X,Y)1.R(X,Y) = \frac{V(X,Y)}{\sqrt{V(X,X) V(Y,Y)}}, \qquad 0 \leq R(X,Y) \leq 1.

A key property is that R(X,Y)=0R(X,Y) = 0 if and only if XX and YY are independent, in stark contrast to the Pearson coefficient, which vanishes under zero covariance, even for nonlinear relationships (Richards, 2017, Castro-Prado et al., 2020, Lyons, 2011).

An equivalent representation using pairwise distances is

V2(X,Y)=E[XXYY]+E[XX]E[YY]E[XXYY]E[XXYY],V^2(X,Y) = E[ \|X - X'\|\, \|Y - Y'\| ] + E[\|X - X'\|] E[\|Y - Y'\|] - E[ \|X - X'\| \,\|Y - Y''\| ] - E[ \|X - X''\| \,\|Y - Y'\| ],

where (X,Y),(X,Y),(X,Y)(X,Y), (X',Y'), (X'',Y'') are independent copies.

These constructions admit population and sample analogues in both Euclidean and metric-space domains, providing a broad framework for dependence quantification with metric weights.

2. Extension to Metric Spaces and Negative Type

To accommodate data in general metric spaces, Lyons (Lyons, 2011, Castro-Prado et al., 2020), and subsequent works define the double-centered kernel

dμ(x,x)=d(x,x)aμ(x)aμ(x)+D(μ),d_\mu(x, x') = d(x, x') - a_\mu(x) - a_\mu(x') + D(\mu),

where aμ(x)=d(x,z)μ(dz)a_\mu(x) = \int d(x, z) \mu(dz), D(μ)=d(u,v)μ(du)μ(dv)D(\mu) = \iint d(u, v) \mu(du) \mu(dv), for probability measure μ\mu.

The distance covariance in metric spaces is then

dCov2(X,Y)=E[dμ(X,X)dν(Y,Y)],\mathrm{dCov}^2(X, Y) = \mathbb{E}[ d_\mu(X, X') d_\nu(Y, Y') ],

with sample versions obtained via doubly-centered distance matrices.

A crucial requirement is that the metric spaces must be of strong negative type. A space (M,d)(\mathcal{M}, d) has negative type if, for any finite signed measure α\alpha with α(M)=0\alpha(\mathcal{M}) = 0,

d(x,y)α(dx)α(dy)0.\iint d(x, y) \alpha(dx) \alpha(dy) \leq 0.

Strong negative type further demands that D(μ1μ2)=0    μ1=μ2D(\mu_1 - \mu_2) = 0 \implies \mu_1 = \mu_2. This property ensures that distance covariance vanishes only under independence (Lyons, 2011, Castro-Prado et al., 2020). Euclidean spaces, separable Hilbert spaces, and p\ell_p spaces with 1<p21 < p \le 2 all possess strong negative type.

3. Weighted, Graph-Based, and Non-Euclidean Correlation Metrics

Beyond canonical distance correlation, recent research addresses weighted and graph-based variants tailored to specialized data structures:

Distance-weighted Pearson correlation on networks (Coscia et al., 2024): Let x,yRnx, y \in \mathbb{R}^n be node attributes on a graph G=(V,E)G=(V,E), with edge-dependent distances d(i,j)d(i,j) and a kernel f:R+R+f: \mathbb{R}_+ \rightarrow \mathbb{R}_+ (typically f(d)=ekdf(d) = e^{-k d}). Form weights Wij=f(d(i,j))W_{ij} = f(d(i,j)). The distance-weighted Pearson correlation is

ρx,y;W=x^TWy^x^TWx^y^TWy^,\rho_{x,y;W} = \frac{ \hat{x}^T W \hat{y} }{ \sqrt{ \hat{x}^T W \hat{x} } \sqrt{ \hat{y}^T W \hat{y} } },

with centered vectors x^,y^\hat{x}, \hat{y}. Negative-type of dd is necessary for well-defined correlation: only then does WW yield positive-definite quadratic forms and correlations bounded in [1,1][-1,1].

Earth Mover’s Correlation (EMC) (Móri et al., 2020): For (M,δ)(\mathcal{M}, \delta) metric spaces and random variables X,YX, Y, let e(μ,ν)e(\mu, \nu) be the first-order Wasserstein (earth mover) distance. EMC defines a nonparametric correlation via

$\eCov(X,Y) = \inf_{(X',Y')} \mathbb{E}[ \delta(X,X') + \delta(Y,Y') ],$

and

$\eCor(X,Y) = \frac{ \eCov(X,Y) }{ \min\{ \eVar(X), \eVar(Y) \} }$

with $\eVar(X) = \mathbb{E}[ \delta(X,X') ]$. EMC is applicable to arbitrary metric spaces, requiring only first moment finiteness. For independence, $\eCov(X,Y) = 0$; for perfect dependence, $\eCor(X,Y) = 1$ (axiomatically in Banach spaces).

Weighted Kendall's Tau and Rank Distances (Piek et al., 2024): Weighted generalizations of Kendall’s tau handle positional importance in rankings. For π,ϕSn\pi, \phi \in S_n, with weights wi,j0w_{i,j} \ge 0, the weighted tau distance is

dW(π,ϕ)=i<jwi,j1(πjπi)(ϕjϕi)<0i<jwi,j,d_W(\pi, \phi) = \frac{\sum_{i < j} w_{i,j} \cdot \mathbf{1}_{(\pi_j - \pi_i)(\phi_j - \phi_i) < 0}}{\sum_{i < j} w_{i,j}},

forming a genuine metric under positive weights. These metrics are relevant for correlation analysis in rank aggregation and preference modeling.

4. Metric-Preserving Transformations of Similarity Measures

Metric distances derived from similarities (e.g., cosine, Pearson, Spearman) are synthesized through metric-preserving functions (Dongen et al., 2012, Solo, 2019):

  • Let s(x,y)[1,1]s(x, y) \in [-1, 1] be a similarity. Choose f:[0,D]R0f: [0, D] \to \mathbb{R}_{\ge0} increasing and concave such that f(0)=0f(0) = 0:
    • d1(x,y)=arccoss(x,y)d_1(x, y) = \arccos s(x, y) (angular distance),
    • d2(x,y)=22s(x,y)d_2(x, y) = \sqrt{2 - 2 s(x, y)},
    • d3(x,y)=1s(x,y)d_3(x, y) = \sqrt{1 - s(x, y)},
    • d4(x,y)=1s(x,y)2d_4(x, y) = \sqrt{1 - s(x, y)^2} (absolute-correlation distance).

Negative-type metrics, such as those arising on trees, resistance distances in graphs, or from suitably constructed kernels, ensure metric validity—preserving triangle inequality and identity of indiscernibles (Coscia et al., 2024).

5. Computational Techniques and Complexity

Empirical evaluation of distance-weighted correlation metrics is generally quadratic in sample size:

  • Compute pairwise distances to form matrices AA (data) and BB (associated metric/weight).
  • Double-center both matrices: Aij=aijaˉiaˉj+aˉA_{ij} = a_{ij} - \bar{a}_{i\cdot} - \bar{a}_{\cdot j} + \bar{a}_{\cdot\cdot}.
  • Distance covariance: Vn2=1n2i,jAijBijV_n^2 = \frac{1}{n^2} \sum_{i,j} A_{ij} B_{ij}.
  • Distance correlation: Rn=Vn(X,Y)/[Vn(X,X)1/2Vn(Y,Y)1/2]R_n = V_n(X,Y) / [ V_n(X,X)^{1/2} V_n(Y,Y)^{1/2} ] (Richards, 2017, Chaudhuri et al., 2018).

Fast algorithms exist for univariate cases, achieving O(nlogn)O(n \log n) complexity via sorting and cumulative sums, making the methods feasible for large-scale applications (Chaudhuri et al., 2018).

6. Practical Applications and Domain-Specific Metrics

Distance-weighted correlation metrics are pivotal in domains where classical linear correlations are insufficient:

  • Astrophysical classification: Nonlinear associations in high-dimensional galaxy surveys are revealed only by distance correlation, further outperforming Pearson's ρ\rho in discriminating types (Richards, 2017).
  • Network analysis: Distance-weighted Pearson and resistance-based metrics provide well-behaved correlation measures on graphs, critical in gene expression, brain connectivity, and cyber-security clustering (Coscia et al., 2024, Solo, 2019).
  • Topological data analysis: Distance correlation enables direct comparison of topological summaries (e.g., persistence diagrams, landscapes) residing in distinct metric spaces, supporting independence testing and parameter association (Turner et al., 2019).
  • Graphical model comparison: Distance-weighted metrics, e.g., uncertainty-normalized Hellinger affinity (Wang et al., 2017), quantify similarity across learned graphical models with proper uncertainty adjustment.

7. Theoretical Guarantees, Limitations, and Extensions

Distance-weighted correlation metrics possess well-established properties:

  • Characterize independence exactly in strong negative-type spaces (Lyons, 2011, Castro-Prado et al., 2020).
  • Are scale- and location-invariant and sensitive to non-linear, non-monotonic dependence (Richards, 2017).
  • Require only first-moment finiteness (Brownian variants: second-moment).
  • Allow permutation-based null inference and bootstrap resampling for nonparametric testing.

Limitations include the need for negative-type metrics, computational burden for n2n^2 evaluations (alleviated by fast algorithms in special cases), and careful selection of metric-preserving transforms to avoid loss of discriminatory power.

Extensions include kernelized independence criteria (HSIC), fractional/fractionalized covariances for heavy-tailed settings, and multiway generalizations for higher-order association structures.


References:

  • "Distance Correlation: A New Tool for Detecting Association and Measuring Correlation Between Data Sets" (Richards, 2017)
  • "Distance covariance in metric spaces" (Lyons, 2011)
  • "Nonparametric independence tests in metric spaces: What is known and what is not" (Castro-Prado et al., 2020)
  • "Pearson Distance is not a Distance" (Solo, 2019)
  • "Metric distances derived from cosine similarity and Pearson and Spearman correlations" (Dongen et al., 2012)
  • "Pearson Correlations on Networks: Corrigendum" (Coscia et al., 2024)
  • "The Earth Mover's Correlation" (Móri et al., 2020)
  • "On a weighted generalization of Kendall's tau distance" (Piek et al., 2024)
  • "Correlation between Multivariate Datasets, from Inter-Graph Distance computed using Graphical Models Learnt With Uncertainties" (Wang et al., 2017)
  • "A fast algorithm for computing distance correlation" (Chaudhuri et al., 2018)
  • "Same But Different: Distance Correlations Between Topological Summaries" (Turner et al., 2019)
  • "Detection of Periodicity Based on Independence Tests - III. Phase Distance Correlation Periodogram" (Zucker, 2017)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Distance-Weighted Correlation Metric.