Cross-MMD: A Kernel-Based Distribution Test

Updated 30 October 2025

Cross-MMD statistic is a kernel-based measure that quantifies differences between probability distributions using multiple kernels and conditional grouping.
It aggregates multi-kernel MMD estimates with advanced variance estimation to enhance sensitivity in hypothesis testing and domain adaptation.
Applications span hypothesis testing, feature alignment, and conditional matching, offering efficient computational strategies for large-scale problems.

The Cross-MMD statistic is a class of estimators and test statistics designed to quantify the difference between two or more probability distributions through kernel-based methods that go beyond simple global comparisons, enabling multi-kernel aggregation, relative similarity tests, category-conditional analysis, and variance estimation for two- and three-sample problems. This topic has evolved to address challenges in kernel two-sample testing, feature alignment, and hypothesis testing in distributional comparison, with key developments in variance estimation, aggregation methodologies, domain adaptation, and conditional distribution matching.

1. Foundations of the Cross-MMD Statistic

The Cross-MMD paradigm generalizes the well-known Maximum Mean Discrepancy (MMD) by considering contrasts or combinations of MMD estimators evaluated under different kernels, across conditional groupings, or as differences between correlated MMD statistics. At its core, MMD is an Integral Probability Metric (IPM) defined in a Reproducing Kernel Hilbert Space (RKHS):

$\mathrm{MMD}^2[\mathcal{F}, P, Q] = \| \mu_P - \mu_Q \|^2_{\mathcal{H}_k}$

$= \mathbb{E}_{x, x' \sim P}[k(x, x')] + \mathbb{E}_{y, y' \sim Q}[k(y, y')] - 2\mathbb{E}_{x \sim P, y \sim Q}[k(x, y)]$

Cross-MMD statistics extend this framework by considering linear or quadratic functions of multiple MMD statistics, typically constructed using either different kernels (as in Mahalanobis aggregation for adaptivity) or across different groupings/classes (as in class-conditional distribution alignment for domain adaptation). A canonical example is the variance of the difference between two correlated MMD statistics (i.e., MMD for $P$ vs. $Q$ and $P$ vs. $R$ ), which is central to relative similarity testing (Sutherland et al., 2019).

2. Construction and Mathematical Formalism

A generic Cross-MMD statistic is formed as a combined functional of multiple MMD estimates. One concrete case is the Mahalanobis aggregated MMD (MMMD) statistic for $r$ kernels $k_1, \ldots, k_r$ :

$T_{m, n} = \Big(\mathrm{MMD}^2[K, X_m, Y_n]\Big)^\top \widehat{\Sigma}^{-1} \Big(\mathrm{MMD}^2[K, X_m, Y_n]\Big)$

where $\mathrm{MMD}^2[K, X_m, Y_n]$ is the vector of MMD estimates for each kernel, and $\widehat{\Sigma}$ is the (empirical) covariance matrix of these statistics under the null hypothesis (Chatterjee et al., 2023).

In conditional settings, Cross-MMD can take the form of mean squared distances between source and target feature distributions within classes: $L_{\mathrm{Cross-MMD}} = \frac{1}{C} \sum_{c=1}^C \mathrm{MMD}^2\left(P^c, Q^c\right)$ where $P^c$ and $Q^c$ are the class- $c$ sub-distributions in source and target (Jambigi et al., 2021, Zhao et al., 6 Dec 2024).

For relative similarity, the difference of squared MMDs is of primary interest: $\Delta = \widehat{\mathrm{MMD}}^2_U(X, Y) - \widehat{\mathrm{MMD}}^2_U(X, Z)$ for $X \sim P^m, Y \sim Q^m, Z \sim R^m$ , sharing the sample $X$ (Sutherland et al., 2019).

3. Theoretical Properties and Variance Estimation

A significant property of Cross-MMD statistics is their non-trivial dependency structure due to shared samples or overlapping kernel features, necessitating joint asymptotic analysis. For multiple kernels, the joint limiting law is a vector of second-order Wiener-Itô stochastic integrals under the null hypothesis:

$(m+n)\, \mathrm{MMD}^2[K, X_m, Y_n] \xrightarrow{d} G_K = \frac{1}{\rho(1-\rho)}\begin{pmatrix}I_2(k_1^\circ) \ \vdots \ I_2(k_r^\circ)\end{pmatrix}$

with the aggregated statistic

$(m+n)^2 T_{m,n} \xrightarrow{d} G_K^\top \Sigma_{H_0}^{-1} G_K$

where the notation follows (Chatterjee et al., 2023).

For the difference of two correlated MMD statistics, an unbiased estimator for the variance is essential (Sutherland et al., 2019): $\mathrm{Var}\left[ \widehat{\mathrm{MMD}}^2_U(X, Y) - \widehat{\mathrm{MMD}}^2_U(X, Z) \right] = \nu_m = \frac{4(m-2)}{m(m-1)} \xi_1 + \frac{2}{m(m-1)} \xi_2$ Expressed in terms of sums over kernel matrices, this formula enables correct error quantification for relative similarity tests.

4. Computational Strategies

Cross-MMD statistics involving kernel aggregation or class-conditional terms generally require $O(r^2 N^2)$ computation for $r$ kernels and total sample size $N$ (Chatterjee et al., 2023). However, modern implementations exploit efficient Gram matrix manipulations and, for some cases, random Fourier feature approximations to further scale.

In class-conditional Cross-MMD (e.g., category-aware domain alignment), the loss is computed per class across source and pseudo-labeled target "help sets" (Zhao et al., 6 Dec 2024). For Mahalanobis aggregation, the main computational cost is the Gram matrix computation, which can be parallelized, and the inversion of a low-dimensional covariance matrix.

Bootstrapping and permutation tests for setting rejection regions in aggregated Cross-MMD also leverage wild multiplier bootstrap procedures exploiting the joint structure, which can be substantially faster than standard permutation-based calibration (Chatterjee et al., 2023).

5. Applications and Empirical Impact

Cross-MMD statistics are deployed in several key contexts:

Hypothesis Testing: Mahalanobis aggregation of multi-kernel MMD statistics (MMMD) yields tests with universal consistency, non-trivial Pitman efficiency, and adaptivity to a wide range of alternatives; this approach avoids kernel parameter selection pitfalls and enables simultaneous sensitivity to local and global distributional differences (Chatterjee et al., 2023).
Relative Similarity Testing: Cross-MMD variance estimators enable rigorous calibration of whether a reference distribution is closer to $P$ or $Q$ (relative similarity), crucial in model selection and generative model validation (Sutherland et al., 2019).
Domain Adaptation and Conditional Alignment: Class-conditional Cross-MMD losses enable explicit alignment of source and target domain features at the semantic (category) level, overcoming issues of global alignment that can collapse class structure (Jambigi et al., 2021, Zhao et al., 6 Dec 2024).
Witness Function and Test Power Optimization: In cross-validation/data-splitting two-sample test designs, combined Cross-MMD approaches enable direct learning of test statistics (e.g., via witness function optimization) with SNR-maximizing properties (Kübler et al., 2021).
Missing Data Testing: Cross-MMD structures underlie bounding strategies for MMD in the presence of missing or arbitrarily missing not at random (MNAR) data; such statistics guarantee valid Type I error control (Zeng et al., 24 May 2024).
Adversarial Learning and Feature Matching: Category-wise or margin-based Cross-MMD aligns feature distributions while maintaining discriminability, as in visible-thermal person Re-ID and cross-domain sensing (Jambigi et al., 2021, Zhao et al., 6 Dec 2024).

6. Comparative Summary Table

Statistic / Setting	Cross-MMD Construction	Primary Application
Mahalanobis Aggregation (MMMD)	Vector of multi-kernel MMDs & covariance	Adaptive two-sample testing
Category-wise Alignment	Average over class-conditional MMDs	Domain adaptation, cross-modal learning
Relative Similarity	Variance of difference of correlated MMDs	Model comparison, relative similarity testing
Missing Data Inference	Bounds on MMD with cross-group kernel means	Valid testing under MNAR constraints

7. Limitations and Ongoing Challenges

Cross-MMD statistics introduce new challenges:

Joint asymptotic behavior can be mathematically intricate, especially for large kernel collections or complex grouping.
Covariance estimation in high dimensions or small samples may impact test calibration.
Choice of conditional groupings/partitions or aggregation weights can influence sensitivity, necessitating principled selection or cross-validation.
In real-data applications (e.g., high-dimensional modalities, small-sample EEG transfer), robustness to label noise and pseudo-label construction in category-wise alignment is an active area of research.

8. References and Key Contributions

Fully unbiased estimators for Cross-MMD variance: (Sutherland et al., 2019)
Mahalanobis aggregation and joint stochastic integral limit theory: (Chatterjee et al., 2023)
Margin-based class-conditional Cross-MMD for distribution alignment: (Jambigi et al., 2021)
Local/class-aware Cross-MMD for domain adaptation and sensing: (Zhao et al., 6 Dec 2024)
Data-splitting and SNR-optimized witness functions: (Kübler et al., 2021)
Permutation and normality-based bounds under missing data: (Zeng et al., 24 May 2024)
Efficient computational strategies through random projections and Fourier features: (Zhao et al., 2014, Hertrich et al., 2023)

These works collectively formalize, analyze, and apply Cross-MMD statistics in diverse, rigorous settings, and have significantly advanced kernel-based hypothesis testing, adaptation, and distributional alignment.