Brownian Distance Covariance
- Brownian Distance Covariance is a nonparametric measure that detects both linear and nonlinear associations between random vectors across various dimensions.
- It computes dependence via pairwise Euclidean distances and characteristic functions weighted by Brownian motion, ensuring a value of zero only under independence.
- Its practical applications span independence testing, model diagnostics, and integration in deep neural networks, backed by rigorous statistical properties and extensions.
Brownian distance covariance (BdCov, also called distance covariance or dCov) is a dependence measure for random vectors that generalizes classical covariance to quantify all types of dependence, including nonlinear and nonmonotone associations. Introduced by Székely and Rizzo, BdCov is defined via characteristic functions with a special weighting derived from Brownian motion, and is zero if and only if the random variables are independent. Its construction, based on pairwise Euclidean distances, is fundamentally nonparametric and applies to multivariate data of arbitrary dimension, with rigorous statistical properties and numerous extensions to metric, Hilbert, and functional spaces (Székely et al., 2010, Székely et al., 2010, &&&2&&&).
1. Formal Definition and Equivalent Forms
Let , be random vectors with joint characteristic function and marginals , . The squared population Brownian distance covariance is (Székely et al., 2010, Xie et al., 2022): where . This construction reduces to the normed -distance between the joint and product characteristic functions, weighted by a kernel corresponding to Brownian motion increments.
An equivalent form in terms of Euclidean distances is (Székely et al., 2010, Lyons, 2011): where and are independent copies.
Sample (empirical) Brownian distance covariance for i.i.d. pairs is computed by forming distance matrices: double-centering each: and analogously for . The empirical squared distance covariance is then
The corresponding sample distance correlation is defined as
with whenever the denominator vanishes (Székely et al., 2010).
2. Theoretical Properties
Characterization of independence: if and only if and are independent, under mild moment conditions (finite first moments) (Székely et al., 2010, Lyons, 2011). This property holds in general metric spaces of strong negative type.
Scale and orthogonal invariance: for scalars and orthonormal matrices , ; is fully invariant under these transformations (Székely et al., 2010, Székely et al., 2010).
Non-negativity: , equality holds if and only if independence.
Asymptotics: Under independence, converges in distribution to a non-degenerate quadratic form , with weights depending on the underlying distributions (Székely et al., 2010, Lyons, 2011). Under alternatives, with convergence rates.
Bias and Unbiased Estimation: The standard estimator is biased upward in small samples. Székely and Rizzo provided an unbiased estimator (Székely et al., 2010): where estimates the product of marginal distance means. The bias-corrected correlation uses in the same ratio as .
3. Relation to Brownian Motion
The “Brownian” in Brownian distance covariance refers to a stochastic-process interpretation: can be viewed as the squared covariance between and , where and are independent Brownian motions with covariance kernels (Székely et al., 2010). This viewpoint establishes that BdCov “sees” all deviations from independence, including nonmonotone nonlinearities, since Brownian motion has a full-rank expansion in function space (Székely et al., 2010).
4. Extensions Beyond Euclidean Data
Brownian distance covariance generalizes to any pair of metric spaces of strong negative type, such as separable Hilbert spaces, allowing its application to high-dimensional, functional, and even non-Euclidean data (Lyons, 2011, Székely et al., 2010). For functional data, the method applies to projections or truncated expansions, and with categorical variables, the distance matrices become indicator matrices on the simplex, reducing the method to analogues of squared-deviation statistics for contingency tables.
The BdCov machinery extends naturally to weighting schemes and other norms (,) in the distance calculations, allowing emphasis on “signal” directions or downweighting noise, and leads to unbiasedness and consistency even in non-standard spaces (Székely et al., 2010).
5. Computational Aspects and Practical Considerations
Computation of the empirical statistic is due to the pairwise distances, which can be limiting for large . For univariate data, algorithms of exist, and for high-dimensional settings, dimensionality reduction (e.g., via PCA or random projections) is recommended (Khoshgnauz, 2012). The bias in finite especially affects small-sample, high-dimensional applications such as genomics and motivates use of the unbiased estimator (Székely et al., 2010, Cope, 2010).
Permutation tests are recommended for independence hypotheses, leveraging the exchangeability of labels under the null. Principal components or clustering using the distance correlation matrix may exhibit artifacts from small-sample bias; application of regularization or thresholding is advised (Cope, 2010).
6. Connections with Kernel, Energy, and Other Independence Measures
BdCov is closely related to energy distances and kernel-based dependence statistics. Its weighting kernel is related via Bochner’s theorem to reproducing kernel Hilbert space (RKHS) embeddings, and the Hilbert–Schmidt independence criterion (HSIC) is a special case with suitable kernel choice (Gretton et al., 2010). The form of BdCov enables extension to arbitrary domains (strings, graphs, groups) where a metric is available.
HSIC has some computational and power advantages, particularly at small sample sizes with well-chosen characteristic kernels. Both measures are consistent against all alternatives and have V-statistic-type estimators (Gretton et al., 2010).
7. Applications and Recent Developments
Brownian distance covariance has been used in testing independence, model diagnostics, structure learning in Markov networks, and as a pooling layer in deep neural networks for few-shot classification (Xie et al., 2022, Khoshgnauz, 2012). For instance, DeepBDC constructs a layer implementing the empirical BdCov matrix in high-dimensional embedding spaces, enabling plug-and-play nonparametric dependency measures in deep models (Xie et al., 2022).
Applied examples include detection of nonmonotone and nonlinear associations in genomics, ecological, and socio-economic data, with empirical demonstrations showing sensitivity to dependencies missed by linear correlation (Székely et al., 2010, Székely et al., 2010).
Extensions under active research include adaptations to mutual independence among more than two variables, high-dimensional consistency, fast approximations, and relaxation of metric and moment conditions (Lyons, 2011, Székely et al., 2010).
References:
- (Székely et al., 2010) G. J. Székely and M. L. Rizzo, "Brownian distance covariance," Ann. Appl. Statist. 3(4), 1236–1265 (2009).
- (Székely et al., 2010) G. J. Székely and M. L. Rizzo, "Rejoinder: Brownian distance covariance."
- (Gretton et al., 2010) Gretton et al., "Discussion of: Brownian distance covariance."
- (Cope, 2010) Leslie Cope, "Discussion of: Brownian distance covariance."
- (Lyons, 2011) R. Lyons, "Distance covariance in metric spaces."
- (Khoshgnauz, 2012) Y. Luo, "Learning Markov Network Structure using Brownian Distance Covariance."
- (Xie et al., 2022) P. Hu et al., "Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification."