Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 165 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 41 tok/s Pro
GPT-5 High 33 tok/s Pro
GPT-4o 124 tok/s Pro
Kimi K2 193 tok/s Pro
GPT OSS 120B 443 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Minimax Bound for KSD Estimation

Updated 20 October 2025
  • The paper establishes that no estimator can converge faster than the n⁻¹/² rate, confirming the minimax optimality of standard KSD estimators.
  • It details an explicit constant for Gaussian kernels, highlighting exponential decay with increasing dimensions and the curse of dimensionality.
  • The work demonstrates the universal applicability of the n⁻¹/² convergence rate across Euclidean, manifold, and graph domains using rigorous adversarial constructions.

The minimax lower bound for kernel Stein discrepancy (KSD) estimation formalizes the fundamental statistical limit of estimating KSD from independent samples. KSD serves as an integral probability metric for quantifying the difference between a target probability distribution P0P_0 and another probability measure PP, and has become a central tool for goodness-of-fit testing in complex models. The minimax lower bound characterizes the worst-case error achievable by any estimator across a class of target and sample distributions, thereby settling the optimal convergence rate for KSD estimators.

1. Formal Statement of the Minimax Lower Bound

The minimax risk for KSD estimation considers the infimum over all measurable estimators F^n\hat F_n based on nn i.i.d. samples, and takes the supremum over choices of pairs (P0,P)(P_0, P) from respective classes:

$R_n^* = \inf_{\hat F_n} \sup_{P_0 \in \mathcal{T}} \sup_{P \in \mathcal{S}_{P_0}} \mathbb{E}_{P^n}\left|\hat F_n(X_1,...,X_n) - \KSD(P_0, P)\right|.$

The minimax lower bound result asserts that Rncn1/2R_n^* \ge c n^{-1/2} for some constant c>0c > 0, meaning no estimator—regardless of construction—can outperform the n1/2n^{-1/2} rate across all allowed distributions. This matches the convergence rate achieved by standard KSD estimators, such as the V-statistic and Nyström-based methods, confirming their minimax optimality.

2. Main Results: Euclidean and General Domains

The minimax rate of KSD estimation is established in two distinct settings:

  1. Langevin-Stein KSD on Rd\mathbb{R}^d: Under regularity conditions (target density is continuously differentiable and strictly positive; kernel is continuous, bounded, translation invariant, and characteristic), for any estimator F^n\hat F_n and any pair (P0,P)(P_0, P), there exists a constant cc (with explicit formula for Gaussian kernel: c=(4γ+1)d/4/2c = (4\gamma+1)^{-d/4}/2) such that

$\inf_{\hat F_n} \sup_{P_0 \in \mathcal{T}} \sup_{P \in \mathcal{S}_{P_0}} \mathbb{P}^n\left( |\hat F_n - \KSD(P_0, P)| \ge c n^{-1/2} \right) > 0.$

Thus, no estimator can converge at a rate faster than O(n1/2)O(n^{-1/2}).

  1. General Domains: For any topological space XX where suitable Stein operators and feature maps exist (assuming mild integrability and nontriviality conditions), the minimax lower bound continues to hold, i.e., the minimax risk remains bounded below by cn1/2c n^{-1/2}. This result implies that the n1/2n^{-1/2} rate is universal for KSD estimation, not an artifact of Euclidean geometry, and applies to domains such as Riemannian manifolds and discrete graphs.

3. Proof Strategy via Le Cam’s Two-Point Method

Both lower bound results crucially utilize Le Cam’s two-point method, which is a standard paradigm in statistical minimax analysis:

  • Adversarial Construction: Construct two probability distributions P0P_0 (target) and P1P_1 (alternative) that are "neighboring" yet sufficiently separated in KSD, with $|\KSD(P_0, P_1) - \KSD(P_0, P_0)| \ge c n^{-1/2}$.
  • KL Divergence Control: Ensure that the Kullback-Leibler divergence KL(P1P0)KL(P_1 \| P_0) over nn samples is bounded, so the sample distributions are statistically close and estimation is fundamentally difficult.
  • Quantitative Lower Bound: Applying Le Cam’s inequality, one obtains a lower bound on the estimation error that cannot be circumvented by any estimator.

In Rd\mathbb{R}^d with Gaussian kernel, the adversary comprises the standard normal and a mean-shifted normal (shift n1/2\propto n^{-1/2}), yielding exact order n1/2n^{-1/2} in KSD, while KL divergence remains bounded across nn. For general domains, the construction perturbs the base measure P0P_0 using a function φ\varphi orthogonal to constant functions, scaled by ϵn=cn1/2\epsilon_n = c n^{-1/2}, and achieves KSD separation proportional to ϵn\epsilon_n, again with bounded KL divergence.

4. Dimensionality Effects and Explicit Rate Constants

For the widely used Gaussian kernel,

k(x,y)=exp(γxy2),k(x, y) = \exp(-\gamma \|x-y\|^2),

the minimax lower bound constant is given by

c=12(4γ+1)d/4.c = \frac{1}{2}(4\gamma + 1)^{-d/4}.

As the ambient dimension dd increases, (4γ+1)d/4(4\gamma+1)^{-d/4} decays exponentially in dd. Thus, the minimax risk, although always scaling as n1/2n^{-1/2}, suffers rapid degradation in high dimensions, illustrating the curse of dimensionality for KSD estimation. This suggests that high-dimensional goodness-of-fit via KSD is statistically challenging, and justifies the empirical observation that estimation error does not decrease rapidly in larger dimensions.

5. Significance of Universal Rate Across Domains

Settling the minimax lower bound for general domains has key implications:

  • Establishes the n1/2n^{-1/2} rate as universal, independent of geometric or topological specifics of the domain.
  • Justifies the usage of KSD estimators in a variety of non-Euclidean or structured contexts (e.g., manifolds, graphs), where modeling and goodness-of-fit tests are increasingly common.
  • Provides a theoretical foundation for practitioners, guaranteeing that known estimators (V-statistic, Nyström, etc.) are rate-optimal even in novel settings.

A plausible implication is that efforts to develop fundamentally faster estimators for KSD (in terms of nn) are mathematically precluded, and future work should focus on reducing hidden constants or optimizing computational aspects, rather than seeking improved convergence rates.

6. Key Formulas and Their Interpretation

The paper provides several central formulas:

  • KSD Definition:

$\KSD(P_0,P) = \sup_{f\in B(H_k^d)} \left| \mathbb{E}_{P_0}\big[(\mathcal{A}_{p_0}f)(X)\big] - \mathbb{E}_P\big[(\mathcal{A}_{p_0}f)(X)\big] \right|$

With the Stein operator Ap0\mathcal{A}_{p_0} constructed so that EP0[Ap0f(X)]=0\mathbb{E}_{P_0}[\mathcal{A}_{p_0}f(X)] = 0, simplifying to

$\KSD(P_0,P) = \left\| \mathbb{E}_P[\xi_{p_0}(X)] \right\|_{H_k^d},$

where the Stein feature map is

ξp0(x)=xlnp0(x)k(,x)+xk(,x).\xi_{p_0}(x) = \nabla_x \ln p_0(x) \, k(\cdot, x) + \nabla_x k(\cdot, x).

  • Minimax Lower Bound:

$\inf_{\hat{F}_n} \sup_{P_0 \in \mathcal{T}} \sup_{P \in \mathcal{S}_{P_0}} \mathbb{P}^n\left( \left| \hat{F}_n - \KSD(P_0,P) \right| \geq \frac{c}{\sqrt{n}} \right) > 0$

This formalizes the impossibility of achieving convergence faster than O(n1/2)O(n^{-1/2}).

  • Gaussian Kernel Case Constant:

c=12(4γ+1)d/4c = \frac{1}{2}(4\gamma+1)^{-d/4}

These formulas underlie the theoretical optimality and limitations of KSD estimation.

7. Practical and Theoretical Implications

The minimax lower bound delineates a clear attainable benchmark for KSD estimation. All estimators constrained by sample data and applicable to broad distribution classes must necessarily converge at rate n1/2n^{-1/2}. The explicit characterization of constants—especially exponential deterioration in dimensionality—emphasizes practical caution for high-dimensional applications. For practitioners, the result mandates that improvements for KSD-based inference focus on algorithmic efficiency, finite-sample constants, or domain-specific modeling, as further rate improvements are statistically blocked.

This comprehensive perspective clarifies both the statistical theory of KSD and its practical impact for goodness-of-fit assessment, optimality of existing estimators, and the challenges involved in high-dimensional and non-Euclidean settings.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Minimax Lower Bound for KSD Estimation.