Permutation-Free MMD Test

Updated 24 October 2025

Permutation-free MMD tests are kernel-based methods that calibrate statistical tests without relying on computationally expensive permutation or resampling procedures.
Key methodologies include martingale reparameterization, sample splitting, and Laplace transform techniques that secure asymptotically normal test statistics.
These tests achieve finite-sample error control and demonstrate scalability and robustness for high-dimensional and structured data applications.

A permutation-free Maximum Mean Discrepancy (MMD) test refers to a class of kernel-based two-sample or goodness-of-fit procedures wherein the calibration of the test statistic, or the determination of its significance threshold, is achieved without reliance on computationally intensive permutation or resampling procedures. Such methods address critical limitations of classical MMD-based testing—specifically, the need for eigenvalue decompositions, asymptotic invalidity in high dimensions, or the impracticality of repeated quadratic-time computation—by developing strategies that yield tractable, theoretically justified, and easily calibrated test statistics. This article details the foundational principles, methodologies, theoretical guarantees, algorithmic advances, and practical implications of permutation-free MMD testing in various high-dimensional and structured-data settings.

1. Formal Relationship with Classical MMD-Based Testing

Let $P$ and $Q$ be Borel probability measures on a measurable domain $\mathcal{X}$ , and let $k : \mathcal{X} \times \mathcal{X} \rightarrow \mathbb{R}$ be a positive-definite kernel inducing an RKHS $\mathcal{H}(k)$ . The classical Maximum Mean Discrepancy (MMD) between $P$ and $Q$ is defined as

$\mathrm{MMD}(P, Q)^2 = \|\mu_P - \mu_Q\|_\mathcal{H}^2,$

where $\mu_P$ is the kernel mean embedding $\mu_P = \mathbb{E}_{X \sim P} k(X, \cdot)$ . The canonical unbiased estimator is a U-statistic, quadratic in the data, whose null distribution (under $P = Q$ ) is a degenerate, infinite sum of weighted chi-squareds.

Classically, to obtain a level- $\alpha$ test, the (1– $\alpha$ )-quantile of the empirical null of the test statistic, estimated via permutation/randomization or via bootstrap calibration, is used. This procedure is computationally expensive (each permutation costs $O(n^2)$ operations), and the intractable null makes analytic or asymptotic thresholds impractical in nontrivial settings.

Permutation-free MMD tests retain the favorable theoretical and robustness properties of MMD-based procedures while replacing permutation-based calibration with alternate mechanisms—such as analytic normal approximations, innovative direct simulation strategies, sample splitting, or statistic regularization—that open the methodology to scalable and interpretable applications in high-dimensional and structured data regimes.

2. Core Methodologies for Permutation-Free Calibration

Permutation-free MMD tests are grounded in diverse algorithmic and theoretical developments. Three prominent approaches are as follows:

Martingale/Test Statistic Reparameterization The mMMD test (Chatterjee et al., 13 Oct 2025) modifies the quadratic MMD statistic to exploit sequential martingale structure. For paired samples $\{(X_i, Y_i)\}_{i=1}^n$ , define a martingale difference sequence via a recursively updated "empirical witness function" $f_i$ . The test statistic is:

$T_n = \frac{1}{n} \sum_{i=2}^n \langle f_i, K(X_i, \cdot) - K(Y_i, \cdot) \rangle_{\mathcal{H}}$

After normalization by a consistent estimator of its variance, central limit theory for martingales yields a standardized test statistic that is asymptotically standard normal under $H_0$ . No permutations or bootstrap calibration are required; the test is conducted by direct thresholding against $z_{1-\alpha}$ .

Sample Splitting and Studentization The cross-MMD test (Shekhar et al., 2022) uses an explicit data-splitting construction: with samples partitioned into two halves, statistics based on each half are computed independently. Denote the cross-MMD statistic by $T = (T_1 + T_2)/2$ , where $T_1$ is a function of sample 1 tested on sample 2, and vice versa. Standardization via direct variance estimation leads to a test statistic that, under mild assumptions, is asymptotically $N(0,1)$ under the null. This also eliminates the need for permutation calibration.
Laplace-MMD and Null Simulations The L-MMD (Laplace-MMD) test (Kellner et al., 2014) leverages the Laplace transform as a distributional feature, introducing the kernel $\bar{k}(y, z) = \exp(\langle y, z \rangle)$ and estimating the difference $\|\bar{\mu}_P - \bar{\mu}_{P_0}\|$ in the induced RKHS. Under the assumption that the null distribution $P_0$ is fully specified, the empirical quantile of the test statistic is constructed by simulating (via Monte Carlo) its distribution under $P_0$ , avoiding permutations entirely. Computation is $O(B n^2)$ for $B$ Monte Carlo replications, a substantial improvement over $O(B n^3)$ in classical settings.
Wild Bootstrap, Multiplier Bootstrap, and Direct Asymptotics Tests such as MMDAgg (Schrab et al., 2021) and Mahalanobis/aggregated MMD (Chatterjee et al., 2023) utilize wild or multiplier bootstrap calibrations, typically involving the application of Rademacher or Gaussian weights to kernel matrices and deriving quantiles or p-values from these pseudo-statistics. These bootstraps are fully valid (often nonasymptotically so), much faster than permutations, and in many cases enable joint kernel aggregation or adaptation over bandwidth collections without cross-validation or held-out data.
Function Space and U-Statistic Decomposition For functional data (Wynne et al., 2020) or high-dimensional vector inputs, tests leverage construction of kernels on function spaces, spectral expansions, and infinite-dimensional Hermite polynomial analogues, yielding explicit CLT guarantees for the (centered and normalized) U-statistic MMD estimator. Analytic quantiles or Gaussian approximations replace permutations for large sample regimes.

3. Theoretical Guarantees: Type I/II Error, Consistency, and Minimaxity

Permutation-free MMD tests often attain exact or nearly exact type I error control and exhibit strong guarantees on test power (type II error), even in high dimension or under misspecification.

Finite-Sample Level Control:

L-MMD explicitly bounds the probability of exceeding the null quantile; when the null quantile is estimated via Monte Carlo percentile ( $\ell$ th order statistic of $B$ simulated values), the type I error satisfies $\alpha - \frac{1}{B+1} \leq \mathbb{P}(nL^2 > q_{\alpha,n}) \leq \alpha$ (Kellner et al., 2014). MMDAgg provides finite-sample type I control via exchangeability arguments under permutation or wild bootstrap, and spectral-regularized tests are proved to be minimax optimal (attaining the separation rate) over Sobolev alternatives (Hagrass et al., 2022, Schrab et al., 2021).

Type II Error (Power) and Rate-Optimality:

For alternatives, exponential decay of the type II error is demonstrated for Laplace-MMD, with bounds explicit in terms of sample size, estimated quantile, and embedding norms (Kellner et al., 2014). Cross-MMD (Shekhar et al., 2022) and mMMD (Chatterjee et al., 13 Oct 2025) are shown to be consistent (power $\rightarrow 1$ as $n \rightarrow \infty$ ) and minimax rate-optimal for local alternatives (especially if a Gaussian kernel is used), and maintain competitive power compared to permutation-based methods.

Consistency in Non-Euclidean/Functional Spaces:

Rigorous conditions ensuring characteristicness of functional kernels, control of reconstruction error, and limiting normality of the estimator are derived in the testing of functional data distributions (Wynne et al., 2020).

4. Comparative Computational Complexity and Scaling

A central motivation for permutation-free methodologies is computational efficiency, especially for large $n$ or $d$ .

Method	Calibration Type	Complexity per Threshold	Test Statistic Time
Classical MMD	Permutation	$O(B n^2)$	$O(n^2)$
Laplace-MMD	Monte Carlo	$O(B n^2)$	$O(n^2)$
Cross-MMD, mMMD	Asymptotic	None (analytic)	$O(n^2)$
MMDAgg Wild Bootstrap	Wild Bootstrap	$O(B n^2)$	$O(n^2)$
Mahalanobis Agg. MMD	Multiplier Boot	$O(B r n^2)$	$O(r n^2)$ for $r$ kernels

For highly parallel architectures or extremely large datasets, FastMMD (Zhao et al., 2014) exploits random Fourier features and fastfood transformations to further reduce computational cost to $O(L n d)$ or $O(L n \log d)$ , bypassing the need for permutations and maintaining low approximation error.

5. Extensions: Adaptation, Robustness, and Model Selection

Adaptive Aggregation and Data-Dependent Kernels

Permutation-free aggregation schemes, such as MMDAgg and MMD-FUSE (Biggs et al., 2023), maximize test power by combining MMD outcomes across a collection of kernels/bandwidths. Both theory and experiment demonstrate that with wild or multiplier bootstrap calibration, these procedures remain level-correct and minimax adaptive over Sobolev parameter spaces.

Model and Hypothesis Selection

Relative Similarity tests (Bounliphone et al., 2015), which compare differences in MMD (or parameterized variants) between observed data and samples from candidate generative models, establish joint asymptotic normality and enable analytic computation of p-values, facilitating efficient model selection in deep generative learning without repeated permutations.

Robustness to Nuisance Structure and Mismeasured Data

Recent advances generalize permutation-free MMD to settings with model parameters estimated from data (Brück et al., 2023), mismeasured or partially observed data (Nafshi et al., 2023, Zeng et al., 24 May 2024), and even structured path space through signature kernels (Alden et al., 2 Jun 2025). Techniques include careful regularization, partial identification, and high-probability bounding strategies that preserve test validity in situations where classical permutation-based methods would fail.

6. Applications, Empirical Performance, and Practical Considerations

Permutation-free MMD tests have been validated across a spectrum of domains:

High-Dimensional Data:

Goodness-of-fit tests for normality in high dimensions (Laplace-MMD) retain type I error and achieve exponentially decaying type II error, outperforming classical multivariate tests (Henze–Zirkler, energy distance) (Kellner et al., 2014).

Large Scale Applications:

FastMMD and cheap permutation methods (Zhao et al., 2014, Domingo-Enrich et al., 11 Feb 2025) demonstrate that approximate, fast calibration does not sacrifice power while yielding orders-of-magnitude speed-ups.

Functional and Path-Space Data:

MMD-based tests over signature kernels and Hilbertian function spaces (Wynne et al., 2020, Alden et al., 2 Jun 2025) enable hypothesis testing on stochastic processes, time series, and structured observations.

Model Selection in Generative Modelling:

Permutation-free relative similarity MMD tests provide automated and statistically principled model ranking in VAEs, GMMNs, and related architectures (Bounliphone et al., 2015).

Robustness:

Through partial identification and worst-case bounding (e.g., for $\epsilon$ -contaminated or MNAR data), permutation-free methods remain valid and powerful where imputation or classical techniques fail (Nafshi et al., 2023, Zeng et al., 24 May 2024).

7. Theoretical and Practical Impact

The development of permutation-free MMD-based tests constitutes a fundamental shift in nonparametric inference under complex data regimes. By providing statistically optimal, computationally scalable, and robust solutions to the calibration bottleneck in kernel-based testing, these methods have enabled the practical deployment of distributional testing, goodness-of-fit analysis, and model selection in settings that were previously intractable due to computational or statistical constraints.

Permutation-free MMD tests serve as a template for developing further kernel- and embedding-based inferential procedures in unsupervised learning, generative modeling, high-dimensional statistics, and the analysis of structured, functional, or incomplete data. The convergence of statistical optimality, analytic tractability, and empirical scalability in these designs ensures their continuing centrality in modern statistical machine learning.