Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hilbert space embeddings and metrics on probability measures (0907.5309v3)

Published 30 Jul 2009 in stat.ML, math.ST, and stat.TH

Abstract: A Hilbert space embedding for probability measures has recently been proposed, with applications including dimensionality reduction, homogeneity testing, and independence testing. This embedding represents any probability measure as a mean element in a reproducing kernel Hilbert space (RKHS). A pseudometric on the space of probability measures can be defined as the distance between distribution embeddings: we denote this as $\gamma_k$, indexed by the kernel function $k$ that defines the inner product in the RKHS. We present three theoretical properties of $\gamma_k$. First, we consider the question of determining the conditions on the kernel $k$ for which $\gamma_k$ is a metric: such $k$ are denoted {\em characteristic kernels}. Unlike pseudometrics, a metric is zero only when two distributions coincide, thus ensuring the RKHS embedding maps all distributions uniquely (i.e., the embedding is injective). While previously published conditions may apply only in restricted circumstances (e.g. on compact domains), and are difficult to check, our conditions are straightforward and intuitive: bounded continuous strictly positive definite kernels are characteristic. Alternatively, if a bounded continuous kernel is translation-invariant on $\bb{R}d$, then it is characteristic if and only if the support of its Fourier transform is the entire $\bb{R}d$. Second, we show that there exist distinct distributions that are arbitrarily close in $\gamma_k$. Third, to understand the nature of the topology induced by $\gamma_k$, we relate $\gamma_k$ to other popular metrics on probability measures, and present conditions on the kernel $k$ under which $\gamma_k$ metrizes the weak topology.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Bharath K. Sriperumbudur (35 papers)
  2. Arthur Gretton (127 papers)
  3. Kenji Fukumizu (89 papers)
  4. Bernhard Schölkopf (412 papers)
  5. Gert R. G. Lanckriet (6 papers)
Citations (717)

Summary

Essay on "Hilbert Space Embeddings and Metrics on Probability Measures"

The paper "Hilbert Space Embeddings and Metrics on Probability Measures" by Bharath K. Sriperumbudur et al. introduces a framework for representing probability measures within a Reproducing Kernel Hilbert Space (RKHS). This approach facilitates the measurement of distances between probability measures via Hilbert space embeddings, which has significant applications in statistical hypothesis testing, machine learning, and probability theory.

Key Contributions

  1. Hilbert Space Embedding and Metric Definition: The authors detail how any probability measure can be expressed as a mean element in an RKHS. They introduce a pseudometric, γk\gamma_k, on the space of probability measures, which is the distance between distribution embeddings in the RKHS defined by a kernel kk. This pseudometric becomes a true metric when the kernel kk ensures that γk\gamma_k is zero only when the two distributions are identical.
  2. Conditions for Characteristic Kernels:

A central contribution of the paper involves identifying conditions under which γk\gamma_k is a metric. Kernels that satisfy these conditions are termed characteristic kernels. The key findings include: - Kernels that are integrally strictly positive definite are always characteristic. - For translation-invariant kernels on Rd\mathbb{R}^d, a kernel is characteristic if and only if the support of its Fourier transform is the entire Rd\mathbb{R}^d. - On dd-dimensional torus Td\mathbb{T}^d, a kernel is characteristic if and only if all non-zero Fourier series coefficients are positive.

  1. Construction of Distributions with Small Distances: Despite being a metric, γk\gamma_k may sometimes yield small values for distinct distributions, notably those that differ in high-frequency components. The authors provide theoretical demonstrations showing that for any ε>0\varepsilon > 0, one can construct distinct distributions such that γk\gamma_k between them is less than ε\varepsilon.
  2. Weak Convergence and Kernel Metrics:

The paper addresses the theoretical implications of γk\gamma_k in terms of weak convergence: - Compact Domains: For universal kernels on compact domains, γk\gamma_k metrizes the weak topology on the space of probability measures. - Non-Compact Rd\mathbb{R}^d: For translation-invariant kernels ψ\psi on Rd\mathbb{R}^d, if the kernel’s Fourier transform weighted by a polynomial is integrable, then γk\gamma_k metrizes the weak topology.

Numerical Results and Practical Implications

The implementation of γk\gamma_k in hypothesis testing and other applications shows significant advantages over traditional methods. For example:

  • Two-Sample and Independence Tests: The distance measure γk\gamma_k can be used effectively to test hypotheses about the equality of distributions or the independence of random variables by comparing their embeddings in RKHS.
  • Density Estimation: The convergence properties of γk\gamma_k offer insights into the quality of density estimates, especially in high-dimensional spaces where other measures might become less reliable.

Future Directions and Implications

The theory posed in this paper opens the door to several future research directions. These include:

  • Optimal Selection of Kernel Parameters: The choice of kernel parameters, particularly for characteristic kernels like Gaussian, should be investigated to optimize γk\gamma_k for specific applications.
  • Extensions to Other Spaces: Extending the theory to more general spaces, including non-Euclidean and structured domains such as graphs, could broaden the applicability of RKHS embedding for complex data types.
  • Algorithmic Efficiency: Improving the computational aspects of γk\gamma_k estimation for large-scale applications remains an important practical consideration.

Conclusion

The paper by Sriperumbudur et al. presents a comprehensive and theoretically sound framework for embedding probability measures into RKHS and defining a metric on the space of these measures. Through rigorous analysis and robust theoretical results, it establishes the utility of using Hilbert space embeddings for tasks in statistical learning and probability theory, with implications for both theoretical research and practical applications.