Minimax Bound for KSD Estimation

Updated 20 October 2025

The paper establishes that no estimator can converge faster than the n⁻¹/² rate, confirming the minimax optimality of standard KSD estimators.
It details an explicit constant for Gaussian kernels, highlighting exponential decay with increasing dimensions and the curse of dimensionality.
The work demonstrates the universal applicability of the n⁻¹/² convergence rate across Euclidean, manifold, and graph domains using rigorous adversarial constructions.

The minimax lower bound for kernel Stein discrepancy (KSD) estimation formalizes the fundamental statistical limit of estimating KSD from independent samples. KSD serves as an integral probability metric for quantifying the difference between a target probability distribution $P_0$ and another probability measure $P$ , and has become a central tool for goodness-of-fit testing in complex models. The minimax lower bound characterizes the worst-case error achievable by any estimator across a class of target and sample distributions, thereby settling the optimal convergence rate for KSD estimators.

1. Formal Statement of the Minimax Lower Bound

The minimax risk for KSD estimation considers the infimum over all measurable estimators $\hat F_n$ based on $n$ i.i.d. samples, and takes the supremum over choices of pairs $(P_0, P)$ from respective classes:

$R_n^* = \inf_{\hat F_n} \sup_{P_0 \in \mathcal{T}} \sup_{P \in \mathcal{S}_{P_0}} \mathbb{E}_{P^n}\left|\hat F_n(X_1,...,X_n) - \KSD(P_0, P)\right|.$

The minimax lower bound result asserts that $R_n^* \ge c n^{-1/2}$ for some constant $c > 0$ , meaning no estimator—regardless of construction—can outperform the $n^{-1/2}$ rate across all allowed distributions. This matches the convergence rate achieved by standard KSD estimators, such as the V-statistic and Nyström-based methods, confirming their minimax optimality.

2. Main Results: Euclidean and General Domains

The minimax rate of KSD estimation is established in two distinct settings:

Langevin-Stein KSD on $\mathbb{R}^d$ : Under regularity conditions (target density is continuously differentiable and strictly positive; kernel is continuous, bounded, translation invariant, and characteristic), for any estimator $\hat F_n$ and any pair $(P_0, P)$ , there exists a constant $c$ (with explicit formula for Gaussian kernel: $c = (4\gamma+1)^{-d/4}/2$ ) such that

$\inf_{\hat F_n} \sup_{P_0 \in \mathcal{T}} \sup_{P \in \mathcal{S}_{P_0}} \mathbb{P}^n\left( |\hat F_n - \KSD(P_0, P)| \ge c n^{-1/2} \right) > 0.$

Thus, no estimator can converge at a rate faster than $O(n^{-1/2})$ .

General Domains: For any topological space $X$ where suitable Stein operators and feature maps exist (assuming mild integrability and nontriviality conditions), the minimax lower bound continues to hold, i.e., the minimax risk remains bounded below by $c n^{-1/2}$ . This result implies that the $n^{-1/2}$ rate is universal for KSD estimation, not an artifact of Euclidean geometry, and applies to domains such as Riemannian manifolds and discrete graphs.

3. Proof Strategy via Le Cam’s Two-Point Method

Both lower bound results crucially utilize Le Cam’s two-point method, which is a standard paradigm in statistical minimax analysis:

Adversarial Construction: Construct two probability distributions $P_0$ (target) and $P_1$ (alternative) that are "neighboring" yet sufficiently separated in KSD, with $|\KSD(P_0, P_1) - \KSD(P_0, P_0)| \ge c n^{-1/2}$.
KL Divergence Control: Ensure that the Kullback-Leibler divergence $KL(P_1 \| P_0)$ over $n$ samples is bounded, so the sample distributions are statistically close and estimation is fundamentally difficult.
Quantitative Lower Bound: Applying Le Cam’s inequality, one obtains a lower bound on the estimation error that cannot be circumvented by any estimator.

In $\mathbb{R}^d$ with Gaussian kernel, the adversary comprises the standard normal and a mean-shifted normal (shift $\propto n^{-1/2}$ ), yielding exact order $n^{-1/2}$ in KSD, while KL divergence remains bounded across $n$ . For general domains, the construction perturbs the base measure $P_0$ using a function $\varphi$ orthogonal to constant functions, scaled by $\epsilon_n = c n^{-1/2}$ , and achieves KSD separation proportional to $\epsilon_n$ , again with bounded KL divergence.

4. Dimensionality Effects and Explicit Rate Constants

For the widely used Gaussian kernel,

$k(x, y) = \exp(-\gamma \|x-y\|^2),$

the minimax lower bound constant is given by

$c = \frac{1}{2}(4\gamma + 1)^{-d/4}.$

As the ambient dimension $d$ increases, $(4\gamma+1)^{-d/4}$ decays exponentially in $d$ . Thus, the minimax risk, although always scaling as $n^{-1/2}$ , suffers rapid degradation in high dimensions, illustrating the curse of dimensionality for KSD estimation. This suggests that high-dimensional goodness-of-fit via KSD is statistically challenging, and justifies the empirical observation that estimation error does not decrease rapidly in larger dimensions.

5. Significance of Universal Rate Across Domains

Settling the minimax lower bound for general domains has key implications:

Establishes the $n^{-1/2}$ rate as universal, independent of geometric or topological specifics of the domain.
Justifies the usage of KSD estimators in a variety of non-Euclidean or structured contexts (e.g., manifolds, graphs), where modeling and goodness-of-fit tests are increasingly common.
Provides a theoretical foundation for practitioners, guaranteeing that known estimators (V-statistic, Nyström, etc.) are rate-optimal even in novel settings.

A plausible implication is that efforts to develop fundamentally faster estimators for KSD (in terms of $n$ ) are mathematically precluded, and future work should focus on reducing hidden constants or optimizing computational aspects, rather than seeking improved convergence rates.

6. Key Formulas and Their Interpretation

The paper provides several central formulas:

KSD Definition:

$\KSD(P_0,P) = \sup_{f\in B(H_k^d)} \left| \mathbb{E}_{P_0}\big[(\mathcal{A}_{p_0}f)(X)\big] - \mathbb{E}_P\big[(\mathcal{A}_{p_0}f)(X)\big] \right|$

With the Stein operator $\mathcal{A}_{p_0}$ constructed so that $\mathbb{E}_{P_0}[\mathcal{A}_{p_0}f(X)] = 0$ , simplifying to

$\KSD(P_0,P) = \left\| \mathbb{E}_P[\xi_{p_0}(X)] \right\|_{H_k^d},$

where the Stein feature map is

$\xi_{p_0}(x) = \nabla_x \ln p_0(x) \, k(\cdot, x) + \nabla_x k(\cdot, x).$

Minimax Lower Bound:

$\inf_{\hat{F}_n} \sup_{P_0 \in \mathcal{T}} \sup_{P \in \mathcal{S}_{P_0}} \mathbb{P}^n\left( \left| \hat{F}_n - \KSD(P_0,P) \right| \geq \frac{c}{\sqrt{n}} \right) > 0$

This formalizes the impossibility of achieving convergence faster than $O(n^{-1/2})$ .

Gaussian Kernel Case Constant:

$c = \frac{1}{2}(4\gamma+1)^{-d/4}$

These formulas underlie the theoretical optimality and limitations of KSD estimation.

7. Practical and Theoretical Implications

The minimax lower bound delineates a clear attainable benchmark for KSD estimation. All estimators constrained by sample data and applicable to broad distribution classes must necessarily converge at rate $n^{-1/2}$ . The explicit characterization of constants—especially exponential deterioration in dimensionality—emphasizes practical caution for high-dimensional applications. For practitioners, the result mandates that improvements for KSD-based inference focus on algorithmic efficiency, finite-sample constants, or domain-specific modeling, as further rate improvements are statistically blocked.

This comprehensive perspective clarifies both the statistical theory of KSD and its practical impact for goodness-of-fit assessment, optimality of existing estimators, and the challenges involved in high-dimensional and non-Euclidean settings.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to Minimax Lower Bound for KSD Estimation.