Papers
Topics
Authors
Recent
2000 character limit reached

Kernel Quantile Embeddings

Updated 4 December 2025
  • Kernel Quantile Embeddings (KQEs) are an advanced framework that generalizes kernel mean embeddings by incorporating directional quantile information.
  • KQEs induce flexible statistical metrics that aggregate quantile discrepancies, recovering kernelized sliced Wasserstein distances under weaker kernel conditions.
  • Empirical benchmarks show that KQEs offer near-linear complexity and competitive performance in two-sample testing compared to traditional MMD methods.

Kernel Quantile Embeddings (KQEs) extend the classical approach of embedding probability distributions into reproducing kernel Hilbert spaces (RKHS) by encapsulating generalized quantile information, rather than mere mean structure. This framework generalizes kernel mean embeddings and facilitates the construction of probability metrics with weaker kernel requirements than maximum mean discrepancy (MMD), recovers kernelized forms of sliced Wasserstein distances, and admits computationally efficient estimators with near-linear complexity. KQEs provide a rich, directionally parameterized representation of distributions, demonstrating theoretical guarantees and competitive empirical performance in two-sample testing and related statistical tasks (Naslidnyk et al., 26 May 2025).

1. Mathematical Framework and Definition

Given a measurable, continuous, bounded kernel k ⁣: ⁣X×XRk\!:\! X\times X \rightarrow \mathbb{R} on a Borel space XX, let H\mathcal{H} denote the RKHS with feature map ψ(x)=k(x,)\psi(x) = k(x,\,\cdot) and inner product ,H\langle \cdot, \cdot \rangle_{\mathcal{H}}. For a probability measure PP on XX, the classical kernel mean embedding is μP=EXP[ψ(X)]H\mu_P = \mathbb{E}_{X\sim P} [\psi(X)] \in \mathcal{H}; kk is mean-characteristic if PμPP \mapsto \mu_P is injective.

Quantiles are generalized to RKHS via directional projections. For uSH={uH:uH=1}u \in S_{\mathcal{H}} = \{ u \in \mathcal{H} : \|u\|_{\mathcal{H}} = 1 \}, define the pushforward measure uPu_{\sharp}P on R\mathbb{R} by u(x)=u,ψ(x)Hu(x) = \langle u, \psi(x) \rangle_{\mathcal{H}}. The one-dimensional α\alpha-quantile of uPu_{\sharp}P is denoted ρuPα\rho_{u_{\sharp}P}^\alpha. The kernel quantile embedding (KQE) for PP, quantile level α[0,1]\alpha \in [0,1], along direction uSHu \in S_{\mathcal{H}}, is

ρPα,u()=ρuPα  u()H,\rho_P^{\alpha,u}(\cdot) = \rho_{u_{\sharp}P}^\alpha \; u(\cdot) \in \mathcal{H},

or equivalently ρPα,u(x)=ρuPαu,ψ(x)H\rho_P^{\alpha,u}(x) = \rho_{u_{\sharp}P}^\alpha \, \langle u, \psi(x) \rangle_{\mathcal{H}}. The full KQE of PP is the collection Qk(P)={ρPα,u:α[0,1],uSH}Q_k(P) = \{\, \rho_P^{\alpha,u} : \alpha \in [0,1],\, u \in S_{\mathcal{H}}\,\} (Naslidnyk et al., 26 May 2025).

2. Associated Probability Metrics

KQEs induce a family of statistical distances aggregating quantile discrepancies over directions and quantile levels. Fix a probability measure ν\nu on [0,1][0,1] to weight quantile levels. For two measures P,QP, Q and direction uSHu \in S_{\mathcal{H}}, define the LpL^p-quantile difference

τp(P,Q;ν,u)=(01ρPα,uρQα,uHp  ν(dα))1/p.\tau_p(P, Q; \nu, u) = \left( \int_0^1 \| \rho_P^{\alpha,u} - \rho_Q^{\alpha,u} \|_\mathcal{H}^p\; \nu(d\alpha) \right)^{1/p}.

To aggregate across SHS_{\mathcal{H}}, define:

  • Expected KQD (e-KQD):

e-KQDp(P,Q;ν,γ)=(Euγ[τp(P,Q;ν,u)p])1/p,\text{e-KQD}_p(P, Q; \nu, \gamma) = \left( \mathbb{E}_{u \sim \gamma} [\, \tau_p(P, Q; \nu, u)^p ] \right)^{1/p},

where γ\gamma is a probability measure on SHS_{\mathcal{H}}.

  • Supremum KQD (sup-KQD):

sup-KQDp(P,Q;ν)=(supuSHτp(P,Q;ν,u)p)1/p.\text{sup-KQD}_p(P, Q; \nu) = \left( \sup_{u \in S_{\mathcal{H}}} \tau_p(P, Q; \nu, u)^p \right)^{1/p}.

These KQD metrics subsume kernel mean discrepancies and recover kernelized variants of (max-)sliced Wasserstein distances. Specifically, for XRdX\subset \mathbb{R}^d, linear kernel k(x,y)=xyk(x,y) = x^\top y, Lebesgue ν\nu, and uniform γ\gamma on Sd1S^{d-1}, e-KQD coincides with the sliced Wasserstein distance SWp(P,Q)SW_p(P, Q) and sup-KQD with max-sliced Wasserstein (Naslidnyk et al., 26 May 2025).

3. Theoretical Guarantees and Properties

Assume: (A1) XX is Hausdorff, separable, and σ\sigma-compact; (A2) kk is continuous and separating (k(x,)k(y,)k(x, \cdot)\neq k(y, \cdot) for xyx\neq y).

  • Quantile-characteristic Kernels: If kk meets A1 and A2, then the map PQk(P)P\mapsto Q_k(P) is injective (Cramér–Wold in RKHS), so kk is quantile-characteristic. Every mean-characteristic kernel is quantile-characteristic, but not conversely.
  • Metric Properties: With ν\nu of full support on [0,1][0,1], sup-KQDp_p defines a metric on PX\mathcal{P}_X. If γ\gamma also has full support on SHS_{\mathcal{H}}, then e-KQDp_p is a metric.
  • Finite-Sample Consistency: For data x1:nPx_{1:n}\sim P, let Pn=1nδxiP_n = \frac{1}{n}\sum\delta_{x_i}. The empirical directional quantile ρPnα,u(x)=[u(x1:n)]αnu(x)\rho_{P_n}^{\alpha,u}(x) = [u(x_{1:n})]_{\lceil \alpha n \rceil} \, u(x) (with []j[\cdot]_j denoting the jj-th order statistic) converges in H\mathcal{H}-norm to ρPα,u\rho_P^{\alpha,u} at Op(n1/2)O_p(n^{-1/2}) rate, under mild uPu_{\sharp}P density assumptions. Similarly, empirical e-KQD estimation converges at rate O(l1/2+n1/2)O(l^{-1/2} + n^{-1/2}) for ll Monte Carlo directions and sample size nn, under regularity conditions on ν\nu and kernel moments (Naslidnyk et al., 26 May 2025).

4. Algorithmic Implementation and Computational Complexity

Estimating Directional Quantiles: Given data x1:nx_{1:n}, for uSHu\in S_{\mathcal{H}}, compute projections u(xi)=u,ψ(xi)Hu(x_i) = \langle u,\,\psi(x_i)\rangle_{\mathcal{H}}, sort, and set ρPnα,u()=[u(x1:n)]αnu()\rho_{P_n}^{\alpha,u}(\cdot) = [u(x_{1:n})]_{\lceil \alpha n \rceil} u(\cdot).

Sampling Directions: Uniform sampling on SHS_{\mathcal{H}} is undefined in infinite-dimensional H\mathcal{H}. Instead, choose a reference measure ξ\xi on XX, define C[f](x)=k(x,y)f(y)ξ(dy)C[f](x) = \int k(x,y)f(y)\xi(dy), sample fN(0,C)f\sim N(0,C), and set u=f/fHu = f/\|f\|_\mathcal{H}. A finite-sample proxy uses z1:mξz_{1:m}\sim \xi, empirical covariance CmC_m, and random λjN(0,1)\lambda_j \sim N(0,1): f(x)=(1/m)j=1mλjk(x,zj)f(x) = (1/\sqrt{m})\sum_{j=1}^m \lambda_j k(x,z_j).

e-KQD Estimator Workflow:

  • For i=1,,li=1,\ldots,l directions: sample z1:mξz_{1:m}\sim\xi, λ1:mN(0,Im)\lambda_{1:m}\sim N(0,I_m); form fif_i, normalize to uiu_i; sort ui(x1:n)u_i(x_{1:n}), ui(y1:n)u_i(y_{1:n}); compute τp,ip=j=1n[ui(x)]j[ui(y)]jpfν(j/n)\tau_{p,i}^p = \sum_{j=1}^n |[u_i(x)]_j - [u_i(y)]_j|^p f_\nu(j/n).
  • Return (1/liτp,ip)1/p(1/l \sum_i \tau_{p,i}^p)^{1/p}.

Complexity Table:

Step Complexity Comment
fi(x1:n)f_i(x_{1:n}), fi(y1:n)f_i(y_{1:n}) O(nm)O(nm) mm kernel terms
Norm $\|f_i\|_\mathcal{H}}$ (Gram) O(m2)O(m^2) m×mm \times m matrix
Sorting O(nlogn)O(n\log n) Each direction
Total for ll directions O(lmax{nm,m2,nlogn})O(l\cdot \max\{nm,\,m^2,\,n\log n\})

Selecting l=m=O(logn)l=m=O(\log n) yields total cost O(nlog2n)O(n\log^2 n) (“near-linear”) (Naslidnyk et al., 26 May 2025).

5. Empirical Results and Benchmarks

Two-sample testing is performed via permutation thresholding of the test statistic, at significance level $0.05$, comparing e-KQD and sup-KQD to quadratic MMD (U-statistic), linear MMD, MMD–Multi (incomplete U-statistic), sliced Wasserstein, and Sinkhorn.

Key experimental findings:

  • Power decay under increasing dimension (dd up to $512$): e-KQD exhibits the most gradual test power decay, outperforming all fast MMD approximations.
  • Laplace vs Gaussian in R\mathbb{R}, identical first-two moments, polynomial kernel (degree 3): MMD fails as the kernel is not mean-characteristic, but all KQDs succeed, indicating that the kernel is quantile-characteristic.
  • Galaxy MNIST ($12288$ dims): e-KQD-centered performs similarly to MMD, while near-linear e-KQD and sup-KQD outperform MMD–Multi.
  • CIFAR-10 vs CIFAR-10.1 ($3072$ dims): near-linear KQDs significantly outperform fast MMD approximations at comparable computational cost (Naslidnyk et al., 26 May 2025).

6. Connections, Interpretation, and Extensions

KQEs systematically extend kernel mean embeddings by substituting the mean with the comprehensive quantile structure along arbitrary directions in H\mathcal{H}. Under mild kernel conditions, KQEs uniquely characterize probability distributions, generalizing the role of mean-characteristic kernels. The induced KQD metrics interpolate between kernel mean discrepancy and (max-)sliced Wasserstein distances, providing a flexible continuum of statistical distances for high-dimensional inference. The near-linear time estimators and favorable empirical results suggest practical utility for large-scale hypothesis testing and related statistical machine learning tasks (Naslidnyk et al., 26 May 2025). A plausible implication is the potential for KQEs to serve as a universal nonparametric tool when classical kernel mean embeddings are insufficient, especially under non mean-characteristic kernels.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Kernel Quantile Embeddings (KQEs).