Hilbert space embeddings and metrics on probability measures (0907.5309v3)

Published 30 Jul 2009 in stat.ML, math.ST, and stat.TH

Abstract: A Hilbert space embedding for probability measures has recently been proposed, with applications including dimensionality reduction, homogeneity testing, and independence testing. This embedding represents any probability measure as a mean element in a reproducing kernel Hilbert space (RKHS). A pseudometric on the space of probability measures can be defined as the distance between distribution embeddings: we denote this as $\gamma_k$, indexed by the kernel function $k$ that defines the inner product in the RKHS. We present three theoretical properties of $\gamma_k$. First, we consider the question of determining the conditions on the kernel $k$ for which $\gamma_k$ is a metric: such $k$ are denoted {\em characteristic kernels}. Unlike pseudometrics, a metric is zero only when two distributions coincide, thus ensuring the RKHS embedding maps all distributions uniquely (i.e., the embedding is injective). While previously published conditions may apply only in restricted circumstances (e.g. on compact domains), and are difficult to check, our conditions are straightforward and intuitive: bounded continuous strictly positive definite kernels are characteristic. Alternatively, if a bounded continuous kernel is translation-invariant on $\bb{R}^d$, then it is characteristic if and only if the support of its Fourier transform is the entire $\bb{R}^d$. Second, we show that there exist distinct distributions that are arbitrarily close in $\gamma_k$. Third, to understand the nature of the topology induced by $\gamma_k$, we relate $\gamma_k$ to other popular metrics on probability measures, and present conditions on the kernel $k$ under which $\gamma_k$ metrizes the weak topology.

Authors (5)

Bharath K. Sriperumbudur (35 papers)
Arthur Gretton (127 papers)
Kenji Fukumizu (89 papers)
Bernhard Schölkopf (412 papers)
Gert R. G. Lanckriet (6 papers)

Citations (717)

View on Semantic Scholar

Summary

Essay on "Hilbert Space Embeddings and Metrics on Probability Measures"

The paper "Hilbert Space Embeddings and Metrics on Probability Measures" by Bharath K. Sriperumbudur et al. introduces a framework for representing probability measures within a Reproducing Kernel Hilbert Space (RKHS). This approach facilitates the measurement of distances between probability measures via Hilbert space embeddings, which has significant applications in statistical hypothesis testing, machine learning, and probability theory.

Key Contributions

Hilbert Space Embedding and Metric Definition: The authors detail how any probability measure can be expressed as a mean element in an RKHS. They introduce a pseudometric, $\gamma_k$ , on the space of probability measures, which is the distance between distribution embeddings in the RKHS defined by a kernel $k$ . This pseudometric becomes a true metric when the kernel $k$ ensures that $\gamma_k$ is zero only when the two distributions are identical.
Conditions for Characteristic Kernels:

A central contribution of the paper involves identifying conditions under which $\gamma_k$ is a metric. Kernels that satisfy these conditions are termed characteristic kernels. The key findings include: - Kernels that are integrally strictly positive definite are always characteristic. - For translation-invariant kernels on $\mathbb{R}^d$ , a kernel is characteristic if and only if the support of its Fourier transform is the entire $\mathbb{R}^d$ . - On $d$ -dimensional torus $\mathbb{T}^d$ , a kernel is characteristic if and only if all non-zero Fourier series coefficients are positive.

Construction of Distributions with Small Distances: Despite being a metric, $\gamma_k$ may sometimes yield small values for distinct distributions, notably those that differ in high-frequency components. The authors provide theoretical demonstrations showing that for any $\varepsilon > 0$ , one can construct distinct distributions such that $\gamma_k$ between them is less than $\varepsilon$ .
Weak Convergence and Kernel Metrics:

The paper addresses the theoretical implications of $\gamma_k$ in terms of weak convergence: - Compact Domains: For universal kernels on compact domains, $\gamma_k$ metrizes the weak topology on the space of probability measures. - Non-Compact $\mathbb{R}^d$ : For translation-invariant kernels $\psi$ on $\mathbb{R}^d$ , if the kernel’s Fourier transform weighted by a polynomial is integrable, then $\gamma_k$ metrizes the weak topology.

Numerical Results and Practical Implications

The implementation of $\gamma_k$ in hypothesis testing and other applications shows significant advantages over traditional methods. For example:

Two-Sample and Independence Tests: The distance measure $\gamma_k$ can be used effectively to test hypotheses about the equality of distributions or the independence of random variables by comparing their embeddings in RKHS.
Density Estimation: The convergence properties of $\gamma_k$ offer insights into the quality of density estimates, especially in high-dimensional spaces where other measures might become less reliable.

Future Directions and Implications

The theory posed in this paper opens the door to several future research directions. These include:

Optimal Selection of Kernel Parameters: The choice of kernel parameters, particularly for characteristic kernels like Gaussian, should be investigated to optimize $\gamma_k$ for specific applications.
Extensions to Other Spaces: Extending the theory to more general spaces, including non-Euclidean and structured domains such as graphs, could broaden the applicability of RKHS embedding for complex data types.
Algorithmic Efficiency: Improving the computational aspects of $\gamma_k$ estimation for large-scale applications remains an important practical consideration.

Conclusion

The paper by Sriperumbudur et al. presents a comprehensive and theoretically sound framework for embedding probability measures into RKHS and defining a metric on the space of these measures. Through rigorous analysis and robust theoretical results, it establishes the utility of using Hilbert space embeddings for tasks in statistical learning and probability theory, with implications for both theoretical research and practical applications.

PDF Markdown