Minimax Estimation of Kernel Mean Embeddings (1602.04361v2)
Abstract: In this paper, we study the minimax estimation of the Bochner integral $$\mu_k(P):=\int_{\mathcal{X}} k(\cdot,x)\,dP(x),$$ also called as the kernel mean embedding, based on random samples drawn i.i.d.~from $P$, where $k:\mathcal{X}\times\mathcal{X}\rightarrow\mathbb{R}$ is a positive definite kernel. Various estimators (including the empirical estimator), $\hat{\theta}n$ of $\mu_k(P)$ are studied in the literature wherein all of them satisfy $\bigl| \hat{\theta}_n-\mu_k(P)\bigr|{\mathcal{H}k}=O_P(n{-1/2})$ with $\mathcal{H}_k$ being the reproducing kernel Hilbert space induced by $k$. The main contribution of the paper is in showing that the above mentioned rate of $n{-1/2}$ is minimax in $|\cdot|{\mathcal{H}k}$ and $|\cdot|{L2(\mathbb{R}d)}$-norms over the class of discrete measures and the class of measures that has an infinitely differentiable density, with $k$ being a continuous translation-invariant kernel on $\mathbb{R}d$. The interesting aspect of this result is that the minimax rate is independent of the smoothness of the kernel and the density of $P$ (if it exists). This result has practical consequences in statistical applications as the mean embedding has been widely employed in non-parametric hypothesis testing, density estimation, causal inference and feature selection, through its relation to energy distance (and distance covariance).