Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Kernel Quantile Embeddings and Associated Probability Metrics (2505.20433v1)

Published 26 May 2025 in stat.ML, cs.LG, math.ST, and stat.TH

Abstract: Embedding probability distributions into reproducing kernel Hilbert spaces (RKHS) has enabled powerful nonparametric methods such as the maximum mean discrepancy (MMD), a statistical distance with strong theoretical and computational properties. At its core, the MMD relies on kernel mean embeddings to represent distributions as mean functions in RKHS. However, it remains unclear if the mean function is the only meaningful RKHS representation. Inspired by generalised quantiles, we introduce the notion of kernel quantile embeddings (KQEs). We then use KQEs to construct a family of distances that: (i) are probability metrics under weaker kernel conditions than MMD; (ii) recover a kernelised form of the sliced Wasserstein distance; and (iii) can be efficiently estimated with near-linear cost. Through hypothesis testing, we show that these distances offer a competitive alternative to MMD and its fast approximations.

Summary

  • The paper introduces kernel quantile embeddings (KQEs) to represent probability distributions in RKHS, proving injectivity under milder conditions than conventional methods.
  • It develops new probability metrics—e-KQD and sup-KQD—that recover sliced Wasserstein distances and interpolate between MMD and sliced Wasserstein frameworks.
  • The work presents near-linear time estimators with rigorous theoretical guarantees, demonstrating competitive performance in high-dimensional two-sample hypothesis testing.

This paper introduces Kernel Quantile Embeddings (KQEs) as a novel way to represent probability distributions in a Reproducing Kernel Hilbert Space (RKHS), offering an alternative to the widely used Kernel Mean Embeddings (KMEs). While KMEs represent a distribution as the mean function in an RKHS, KQEs leverage the concept of directional quantiles of the feature map xk(x,)x \mapsto k(x, \cdot). This approach is motivated by the fact that the set of all quantiles fully characterizes a probability distribution in one dimension.

The core idea is to first map data points from the input space XX into the RKHS HH via the kernel feature map ψ(x)=k(x,)\psi(x) = k(x, \cdot). This transforms the probability measure PP on XX into a pushforward measure ψ#P\psi \# P on HH. A KQE of PP for a given quantile level α[0,1]\alpha \in [0, 1] and a direction uu in the unit sphere SHS_H of the RKHS is defined as the α\alpha-quantile of the projected measure ϕu#(ψ#P)\phi_u \# (\psi \# P) along the direction uu, where ϕu(h)=u,hH\phi_u(h) = \langle u, h \rangle_H is the projection operator in HH. This results in an element ρPα,uH\rho_P^{\alpha,u} \in H (Equation 6) defined via its evaluation function ρPα,u(x)=ρu#Pαu(x)\rho_P^{\alpha,u}(x) = \rho^\alpha_{u \# P} u(x).

A key theoretical contribution is the demonstration that a kernel kk is "quantile-characteristic" (meaning the mapping P{ρPα,u:α[0,1],uSH}P \mapsto \{\rho_P^{\alpha,u} : \alpha \in [0, 1], u \in S_H\} is injective) under weaker conditions (Hausdorff, separable, σ\sigma-compact input space XX and continuous, separating kernel kk) than those required for a kernel to be mean-characteristic (Theorem 1 and Theorem 2) (2505.20433). This has practical implications, as it means methods based on comparing KQEs can distinguish between a broader class of distributions than methods based on comparing KMEs, such as the Maximum Mean Discrepancy (MMD).

Based on KQEs, the paper proposes a family of probability metrics called Kernel Quantile Discrepancies (KQDs). Two primary types are introduced (Equation 9):

  1. Expected KQD (e-KQD): Averages the difference between KQEs over directions uSHu \in S_H according to a measure γ\gamma. $e-KQD_p(P, Q; \nu, \gamma) = \left(E_{u \sim \gamma} \left[\int_0^1 \big\| \rho_P^{\alpha,u} - \rho_Q^{\alpha,u} \big\|_H^p \nu(d \alpha) \right]\right)^{\nicefrac{1}{p}}$.
  2. Supremum KQD (sup-KQD): Takes the supremum of the difference between KQEs over directions uSHu \in S_H. $sup-KQD_p(P, Q; \nu) = \big(\sup_{u \in S_H} \int_0^1 \big\| \rho_P^{\alpha,u} - \rho_Q^{\alpha,u} \big\|_H^p \nu(d \alpha) \big)^{\nicefrac{1}{p}}$. Here, ν\nu is a weighting measure on [0,1][0, 1] for different quantile levels α\alpha. The paper shows that both e-KQD and sup-KQD are probability metrics under the same mild conditions as quantile-characteristic kernels (Theorem 4) (2505.20433).

The paper establishes connections between KQDs and existing probability metrics:

  • When using a linear kernel k(x,y)=xyk(x, y) = x^\top y and taking ν\nu as the Lebesgue measure, KQDs recover kernelized forms of Sliced Wasserstein (SW) and Max-Sliced Wasserstein (max-SW) distances (Connections 1 and 2) (2505.20433).
  • Centered versions of KQDs relate to a sum of MMD and kernelized sliced Wasserstein distances, suggesting they can interpolate between MMD and SW (Connection 3) (2505.20433).

A significant practical contribution is the development of an efficient estimator for e-KQD, particularly for γ\gamma being a Gaussian measure on HH. Estimating the directional quantile ρPα,u\rho_P^{\alpha,u} empirically involves computing the α\alpha-quantile of {u(xi)}i=1n\{u(x_i)\}_{i=1}^n for samples x1:nPx_{1:n} \sim P, which can be done efficiently using order statistics. The paper provides a consistency guarantee for this empirical KQE estimator (Theorem 3) (2505.20433), showing an O(n1/2)O(n^{-1/2}) convergence rate under mild conditions.

The e-KQD estimator, presented in Algorithm 1, approximates the expectation over directions uγu \sim \gamma using Monte Carlo sampling. To sample uSHu \in S_H from a Gaussian-induced measure γ\gamma, the paper leverages the fact that sampling from a Gaussian measure on HH with a specific integral covariance operator can be reduced to sampling from a standard Gaussian in Rm\mathbb{R}^m and using samples z1:mz_{1:m} from a reference measure ξ\xi on XX (Proposition 1) (2505.20433). The estimator then computes the quantile differences for each sampled direction and averages them.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Algorithm 1 Gaussian e-KQD Estimator (Simplified)
Input: Data x_1:n ~ P, y_1:n ~ Q, reference samples z_1:m ~ xi, kernel k, density f_nu, number of projections l, power p.
Initialize e-KQD^p = 0
For i = 1 to l:
  Sample lambda_1:m ~ N(0, Id_m)
  Compute f_i_x = lambda_1:m^T * k(z_1:m, x_1:n) / sqrt(m)  # Vector of values u_i(x_j) up to scale
  Compute f_i_y = lambda_1:m^T * k(z_1:m, y_1:n) / sqrt(m)  # Vector of values u_i(y_j) up to scale
  Compute ||f_i||_H = sqrt(lambda_1:m^T * k(z_1:m, z_1:m) * lambda_1:m / m) # Norm up to scale
  Compute u_i_x = f_i_x / ||f_i||_H  # Actual projected values u_i(x_j)
  Compute u_i_y = f_i_y / ||f_i||_H  # Actual projected values u_i(y_j)
  Sort u_i_x and u_i_y to get order statistics [u_i(x_1:n)]_j and [u_i(y_1:n)]_j
  Initialize tau_p_i^p = 0
  For j = 1 to n:
    tau_p_i^p += (| [u_i(x_1:n)]_j - [u_i(y_1:n)]_j |)^p * f_nu(ceil(j/n))
  e-KQD^p += tau_p_i^p / l
Return e-KQD^p^(1/p)

The computational complexity of this Gaussian e-KQD estimator is analyzed. With l=O(logn)l = O(\log n) projections and m=O(logn)m = O(\log n) reference samples, computing the projected values ui(x1:n)u_i(x_{1:n}) and ui(y1:n)u_i(y_{1:n}) takes O(nm)O(nm) time, computing the norm fiH\|f_i\|_H takes O(m2)O(m^2), and sorting takes O(nlogn)O(n \log n). Summing over ll projections gives a total complexity of O(lmax(nm,m2,nlogn))O(l \max(nm, m^2, n \log n)). By setting l=m=O(logn)l=m=O(\log n), the complexity becomes O(nlog2n)O(n \log^2 n), which is near-linear in nn. This is significantly more efficient than the O(n2)O(n^2) complexity of standard U-statistic MMD estimators or O(Tnlogn)O(T n \log n) for optimizing max-SW/max-GSW, though generally slower than the O(n)O(n) MMD-Linear estimator. The paper also provides a finite-sample consistency guarantee for the empirical e-KQD estimator, showing an O(l1/2+n1/2)O(l^{-1/2} + n^{-1/2}) rate (Theorem 5) (2505.20433).

The paper evaluates the proposed KQDs in the practical application of nonparametric two-sample hypothesis testing, comparing their performance (measured by rejection rate) against MMD and its fast approximations on synthetic and real-world datasets.

  • Power-decay experiment: e-KQD demonstrates better robustness to increasing dimensionality compared to MMD-Multi (a fast MMD approximation of similar complexity).
  • Laplace vs. Gaussian experiment: Using a polynomial kernel (which is not mean-characteristic but is quantile-characteristic), KQDs successfully distinguish between a Gaussian and a Laplace distribution with matching low-order moments, while MMD fails. This empirically verifies the theoretical finding on weaker characteristic conditions.
  • Real-world image data (Galaxy MNIST, CIFAR): On high-dimensional image data, the near-linear time e-KQD and sup-KQD estimators are competitive with or outperform fast MMD estimators of similar complexity. The quadratic-time centered e-KQD performs similarly to quadratic-time MMD.

The experimental results highlight that KQDs offer a compelling alternative to MMD for two-sample testing, providing competitive performance, particularly in high dimensions and scenarios where the kernel might not be mean-characteristic, while enabling efficient estimation. Future work could explore optimizing the choice of weighting measures ν\nu and ξ\xi, developing improved estimators for KQEs, and extending the concepts to conditional settings.