Kernel Quantile Embeddings
- Kernel Quantile Embeddings (KQEs) are an advanced framework that generalizes kernel mean embeddings by incorporating directional quantile information.
- KQEs induce flexible statistical metrics that aggregate quantile discrepancies, recovering kernelized sliced Wasserstein distances under weaker kernel conditions.
- Empirical benchmarks show that KQEs offer near-linear complexity and competitive performance in two-sample testing compared to traditional MMD methods.
Kernel Quantile Embeddings (KQEs) extend the classical approach of embedding probability distributions into reproducing kernel Hilbert spaces (RKHS) by encapsulating generalized quantile information, rather than mere mean structure. This framework generalizes kernel mean embeddings and facilitates the construction of probability metrics with weaker kernel requirements than maximum mean discrepancy (MMD), recovers kernelized forms of sliced Wasserstein distances, and admits computationally efficient estimators with near-linear complexity. KQEs provide a rich, directionally parameterized representation of distributions, demonstrating theoretical guarantees and competitive empirical performance in two-sample testing and related statistical tasks (Naslidnyk et al., 26 May 2025).
1. Mathematical Framework and Definition
Given a measurable, continuous, bounded kernel on a Borel space , let denote the RKHS with feature map and inner product . For a probability measure on , the classical kernel mean embedding is ; is mean-characteristic if is injective.
Quantiles are generalized to RKHS via directional projections. For , define the pushforward measure on by . The one-dimensional -quantile of is denoted . The kernel quantile embedding (KQE) for , quantile level , along direction , is
or equivalently . The full KQE of is the collection (Naslidnyk et al., 26 May 2025).
2. Associated Probability Metrics
KQEs induce a family of statistical distances aggregating quantile discrepancies over directions and quantile levels. Fix a probability measure on to weight quantile levels. For two measures and direction , define the -quantile difference
To aggregate across , define:
- Expected KQD (e-KQD):
where is a probability measure on .
- Supremum KQD (sup-KQD):
These KQD metrics subsume kernel mean discrepancies and recover kernelized variants of (max-)sliced Wasserstein distances. Specifically, for , linear kernel , Lebesgue , and uniform on , e-KQD coincides with the sliced Wasserstein distance and sup-KQD with max-sliced Wasserstein (Naslidnyk et al., 26 May 2025).
3. Theoretical Guarantees and Properties
Assume: (A1) is Hausdorff, separable, and -compact; (A2) is continuous and separating ( for ).
- Quantile-characteristic Kernels: If meets A1 and A2, then the map is injective (Cramér–Wold in RKHS), so is quantile-characteristic. Every mean-characteristic kernel is quantile-characteristic, but not conversely.
- Metric Properties: With of full support on , sup-KQD defines a metric on . If also has full support on , then e-KQD is a metric.
- Finite-Sample Consistency: For data , let . The empirical directional quantile (with denoting the -th order statistic) converges in -norm to at rate, under mild density assumptions. Similarly, empirical e-KQD estimation converges at rate for Monte Carlo directions and sample size , under regularity conditions on and kernel moments (Naslidnyk et al., 26 May 2025).
4. Algorithmic Implementation and Computational Complexity
Estimating Directional Quantiles: Given data , for , compute projections , sort, and set .
Sampling Directions: Uniform sampling on is undefined in infinite-dimensional . Instead, choose a reference measure on , define , sample , and set . A finite-sample proxy uses , empirical covariance , and random : .
e-KQD Estimator Workflow:
- For directions: sample , ; form , normalize to ; sort , ; compute .
- Return .
Complexity Table:
| Step | Complexity | Comment |
|---|---|---|
| , | kernel terms | |
| Norm $\|f_i\|_\mathcal{H}}$ (Gram) | matrix | |
| Sorting | Each direction | |
| Total for directions |
Selecting yields total cost (“near-linear”) (Naslidnyk et al., 26 May 2025).
5. Empirical Results and Benchmarks
Two-sample testing is performed via permutation thresholding of the test statistic, at significance level $0.05$, comparing e-KQD and sup-KQD to quadratic MMD (U-statistic), linear MMD, MMD–Multi (incomplete U-statistic), sliced Wasserstein, and Sinkhorn.
Key experimental findings:
- Power decay under increasing dimension ( up to $512$): e-KQD exhibits the most gradual test power decay, outperforming all fast MMD approximations.
- Laplace vs Gaussian in , identical first-two moments, polynomial kernel (degree 3): MMD fails as the kernel is not mean-characteristic, but all KQDs succeed, indicating that the kernel is quantile-characteristic.
- Galaxy MNIST ($12288$ dims): e-KQD-centered performs similarly to MMD, while near-linear e-KQD and sup-KQD outperform MMD–Multi.
- CIFAR-10 vs CIFAR-10.1 ($3072$ dims): near-linear KQDs significantly outperform fast MMD approximations at comparable computational cost (Naslidnyk et al., 26 May 2025).
6. Connections, Interpretation, and Extensions
KQEs systematically extend kernel mean embeddings by substituting the mean with the comprehensive quantile structure along arbitrary directions in . Under mild kernel conditions, KQEs uniquely characterize probability distributions, generalizing the role of mean-characteristic kernels. The induced KQD metrics interpolate between kernel mean discrepancy and (max-)sliced Wasserstein distances, providing a flexible continuum of statistical distances for high-dimensional inference. The near-linear time estimators and favorable empirical results suggest practical utility for large-scale hypothesis testing and related statistical machine learning tasks (Naslidnyk et al., 26 May 2025). A plausible implication is the potential for KQEs to serve as a universal nonparametric tool when classical kernel mean embeddings are insufficient, especially under non mean-characteristic kernels.