- The paper introduces kernel quantile embeddings (KQEs) to represent probability distributions in RKHS, proving injectivity under milder conditions than conventional methods.
- It develops new probability metrics—e-KQD and sup-KQD—that recover sliced Wasserstein distances and interpolate between MMD and sliced Wasserstein frameworks.
- The work presents near-linear time estimators with rigorous theoretical guarantees, demonstrating competitive performance in high-dimensional two-sample hypothesis testing.
This paper introduces Kernel Quantile Embeddings (KQEs) as a novel way to represent probability distributions in a Reproducing Kernel Hilbert Space (RKHS), offering an alternative to the widely used Kernel Mean Embeddings (KMEs). While KMEs represent a distribution as the mean function in an RKHS, KQEs leverage the concept of directional quantiles of the feature map x↦k(x,⋅). This approach is motivated by the fact that the set of all quantiles fully characterizes a probability distribution in one dimension.
The core idea is to first map data points from the input space X into the RKHS H via the kernel feature map ψ(x)=k(x,⋅). This transforms the probability measure P on X into a pushforward measure ψ#P on H. A KQE of P for a given quantile level α∈[0,1] and a direction u in the unit sphere SH of the RKHS is defined as the α-quantile of the projected measure ϕu#(ψ#P) along the direction u, where ϕu(h)=⟨u,h⟩H is the projection operator in H. This results in an element ρPα,u∈H (Equation 6) defined via its evaluation function ρPα,u(x)=ρu#Pαu(x).
A key theoretical contribution is the demonstration that a kernel k is "quantile-characteristic" (meaning the mapping P↦{ρPα,u:α∈[0,1],u∈SH} is injective) under weaker conditions (Hausdorff, separable, σ-compact input space X and continuous, separating kernel k) than those required for a kernel to be mean-characteristic (Theorem 1 and Theorem 2) (2505.20433). This has practical implications, as it means methods based on comparing KQEs can distinguish between a broader class of distributions than methods based on comparing KMEs, such as the Maximum Mean Discrepancy (MMD).
Based on KQEs, the paper proposes a family of probability metrics called Kernel Quantile Discrepancies (KQDs). Two primary types are introduced (Equation 9):
- Expected KQD (e-KQD): Averages the difference between KQEs over directions u∈SH according to a measure γ. $e-KQD_p(P, Q; \nu, \gamma) = \left(E_{u \sim \gamma} \left[\int_0^1 \big\| \rho_P^{\alpha,u} - \rho_Q^{\alpha,u} \big\|_H^p \nu(d \alpha) \right]\right)^{\nicefrac{1}{p}}$.
- Supremum KQD (sup-KQD): Takes the supremum of the difference between KQEs over directions u∈SH. $sup-KQD_p(P, Q; \nu) = \big(\sup_{u \in S_H} \int_0^1 \big\| \rho_P^{\alpha,u} - \rho_Q^{\alpha,u} \big\|_H^p \nu(d \alpha) \big)^{\nicefrac{1}{p}}$.
Here, ν is a weighting measure on [0,1] for different quantile levels α. The paper shows that both e-KQD and sup-KQD are probability metrics under the same mild conditions as quantile-characteristic kernels (Theorem 4) (2505.20433).
The paper establishes connections between KQDs and existing probability metrics:
- When using a linear kernel k(x,y)=x⊤y and taking ν as the Lebesgue measure, KQDs recover kernelized forms of Sliced Wasserstein (SW) and Max-Sliced Wasserstein (max-SW) distances (Connections 1 and 2) (2505.20433).
- Centered versions of KQDs relate to a sum of MMD and kernelized sliced Wasserstein distances, suggesting they can interpolate between MMD and SW (Connection 3) (2505.20433).
A significant practical contribution is the development of an efficient estimator for e-KQD, particularly for γ being a Gaussian measure on H. Estimating the directional quantile ρPα,u empirically involves computing the α-quantile of {u(xi)}i=1n for samples x1:n∼P, which can be done efficiently using order statistics. The paper provides a consistency guarantee for this empirical KQE estimator (Theorem 3) (2505.20433), showing an O(n−1/2) convergence rate under mild conditions.
The e-KQD estimator, presented in Algorithm 1, approximates the expectation over directions u∼γ using Monte Carlo sampling. To sample u∈SH from a Gaussian-induced measure γ, the paper leverages the fact that sampling from a Gaussian measure on H with a specific integral covariance operator can be reduced to sampling from a standard Gaussian in Rm and using samples z1:m from a reference measure ξ on X (Proposition 1) (2505.20433). The estimator then computes the quantile differences for each sampled direction and averages them.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
Algorithm 1 Gaussian e-KQD Estimator (Simplified)
Input: Data x_1:n ~ P, y_1:n ~ Q, reference samples z_1:m ~ xi, kernel k, density f_nu, number of projections l, power p.
Initialize e-KQD^p = 0
For i = 1 to l:
Sample lambda_1:m ~ N(0, Id_m)
Compute f_i_x = lambda_1:m^T * k(z_1:m, x_1:n) / sqrt(m) # Vector of values u_i(x_j) up to scale
Compute f_i_y = lambda_1:m^T * k(z_1:m, y_1:n) / sqrt(m) # Vector of values u_i(y_j) up to scale
Compute ||f_i||_H = sqrt(lambda_1:m^T * k(z_1:m, z_1:m) * lambda_1:m / m) # Norm up to scale
Compute u_i_x = f_i_x / ||f_i||_H # Actual projected values u_i(x_j)
Compute u_i_y = f_i_y / ||f_i||_H # Actual projected values u_i(y_j)
Sort u_i_x and u_i_y to get order statistics [u_i(x_1:n)]_j and [u_i(y_1:n)]_j
Initialize tau_p_i^p = 0
For j = 1 to n:
tau_p_i^p += (| [u_i(x_1:n)]_j - [u_i(y_1:n)]_j |)^p * f_nu(ceil(j/n))
e-KQD^p += tau_p_i^p / l
Return e-KQD^p^(1/p) |
The computational complexity of this Gaussian e-KQD estimator is analyzed. With l=O(logn) projections and m=O(logn) reference samples, computing the projected values ui(x1:n) and ui(y1:n) takes O(nm) time, computing the norm ∥fi∥H takes O(m2), and sorting takes O(nlogn). Summing over l projections gives a total complexity of O(lmax(nm,m2,nlogn)). By setting l=m=O(logn), the complexity becomes O(nlog2n), which is near-linear in n. This is significantly more efficient than the O(n2) complexity of standard U-statistic MMD estimators or O(Tnlogn) for optimizing max-SW/max-GSW, though generally slower than the O(n) MMD-Linear estimator. The paper also provides a finite-sample consistency guarantee for the empirical e-KQD estimator, showing an O(l−1/2+n−1/2) rate (Theorem 5) (2505.20433).
The paper evaluates the proposed KQDs in the practical application of nonparametric two-sample hypothesis testing, comparing their performance (measured by rejection rate) against MMD and its fast approximations on synthetic and real-world datasets.
- Power-decay experiment: e-KQD demonstrates better robustness to increasing dimensionality compared to MMD-Multi (a fast MMD approximation of similar complexity).
- Laplace vs. Gaussian experiment: Using a polynomial kernel (which is not mean-characteristic but is quantile-characteristic), KQDs successfully distinguish between a Gaussian and a Laplace distribution with matching low-order moments, while MMD fails. This empirically verifies the theoretical finding on weaker characteristic conditions.
- Real-world image data (Galaxy MNIST, CIFAR): On high-dimensional image data, the near-linear time e-KQD and sup-KQD estimators are competitive with or outperform fast MMD estimators of similar complexity. The quadratic-time centered e-KQD performs similarly to quadratic-time MMD.
The experimental results highlight that KQDs offer a compelling alternative to MMD for two-sample testing, providing competitive performance, particularly in high dimensions and scenarios where the kernel might not be mean-characteristic, while enabling efficient estimation. Future work could explore optimizing the choice of weighting measures ν and ξ, developing improved estimators for KQEs, and extending the concepts to conditional settings.