WS-KDE: Wilson Score Kernel Estimator

Updated 12 September 2025

WS-KDE is a statistical method that integrates Wilson score-based confidence bounds with kernel density estimation to robustly quantify uncertainty for [0,1]-bounded outputs.
It combines local weighted averages with conservative confidence intervals, effectively handling low-sample sizes and unknown noise distributions.
WS-KDE enhances Bayesian optimization by preventing premature exclusion of promising regions, ensuring reliable convergence to global optima.

The Wilson Score Kernel Density Estimator (WS-KDE) is a statistical methodology designed to provide robust, high-coverage confidence intervals for the mean of stochastic black-box functions with outputs confined to the interval [0, 1], independently of the underlying output distribution. Developed for scenarios such as robotics control, simulation-based optimization, and experimental design, WS-KDE finds particular utility as the function estimator within Bayesian optimization frameworks. Its defining feature is the integration of Wilson score-based confidence bounds (historically used for binomial outcomes) with local kernel density estimation, yielding conservative and statistically sound uncertainty estimates even when sample sizes per query point are low and disturbance distributions are unknown.

1. Problem Setting and Motivation

Many engineering and scientific optimization problems involve the tuning of systems via expensive, noisy stochastic function evaluations. The target function $f(x, P)$ , dependent on design or control parameters $x$ and on unknown disturbances or perturbations $P \sim \rho(P)$ , yields outputs in the unit interval. The primary objective is the maximization of the expected response:

$S(x) = \int f(x, P) \rho(P) dP.$

Repeated function evaluations are costly, and the distribution of disturbances is typically inaccessible. Efficient black-box optimization in these conditions generally leverages Bayesian optimization, which depends critically on reliable mean and uncertainty estimates to guide exploration and safely prune sub-optimal regions of the parameter space. Standard probabilistic estimators often rely on Gaussian assumptions or require large sample sizes, leading to poor or overconfident uncertainty quantification at low sample counts, and risk the premature exclusion of potentially optimal regions.

2. Wilson Score Confidence Bounds for [0,1]-Bounded Outputs

The Wilson score method, originally developed for binomial proportions, outperforms the standard Gaussian approximation by providing conservative, high-coverage confidence intervals regardless of the sample size. The classic Wilson bounds for a binomial outcome with observed proportion $\hat{p}$ over $n$ trials are determined by:

$\hat{p}_{ws} = \frac{n}{n+z^2} \hat{p} + \frac{z^2}{2(n + z^2)},$

$\hat{\sigma}_{ws} = \frac{n}{n+z^2} z \sqrt{ \frac{1}{n} \hat{p}(1-\hat{p}) + \frac{z^2}{4 n^2} },$

yielding a confidence interval

$[\hat{p}_{ws} - z \hat{\sigma}_{ws}, \hat{p}_{ws} + z \hat{\sigma}_{ws}].$

Crucially, for stochastic functions on $[0,1]$ , the true variance is always upper bounded by the binomial variance, justifying the use of these Wilson intervals more generally. The consequence is that Wilson score intervals remain conservative and valid whether the underlying noise model is Bernoulli, Beta, or arbitrary on $[0,1]$ .

3. Integration with Kernel Density Estimation

WS-KDE extends kernel density estimation (KDE) to stochastic, bounded-output regimes. In standard KDE, the local mean at a query point $x$ is estimated by:

$m_h(x) = \frac{ \sum_{i=1}^n K_{h,x_i}(x) y_i }{ \sum_{i=1}^n K_{h,x_i}(x) },$

where $y_i$ are observed responses at sampled points $x_i$ and $K_{h,x_i}(x)$ is a smoothing kernel (often Gaussian with bandwidth $h$ ) centered at $x_i$ . The effective number of local samples is quantified as:

$n_h(x) = \frac{n h}{\|K\|_2^2} \sum_{i=1}^n K_{h,x_i}(x),$

with $\|K\|_2^2 = \int K(u)^2 du$ . WS-KDE synthesizes these local weighted averages and effective sample sizes with the Wilson score formulas:

$\hat{p}_{ws,kde} = \frac{ n_h(x) }{ n_h(x) + z^2 } m_h(x) + \frac{ z^2 }{ 2 ( n_h(x) + z^2 ) },$

$\hat{\sigma}_{ws,kde} = \frac{ n_h(x) }{ n_h(x) + z^2 } z \sqrt{ \frac{1}{ n_h(x) } m_h(x) (1 - m_h(x) ) + \frac{ z^2 }{4 n_h(x)^2 } }.$

These estimators provide a local pointwise mean and robust confidence bounds, which remain conservative even with sparse data.

4. Statistical Properties and Advantages

WS-KDE has several notable properties that give it practical advantage:

Low-sample robustness: The Wilson score construction corrects for overconfidence, maintaining reliable uncertainty quantification from the earliest iterations, unlike Gaussian approximations that underestimate variance with few samples.
Distributional invariance: The validity of the confidence bounds holds for all [0,1]-bounded stochastic functions, not just for binomial or Bernoulli cases, meaning that users need not model the output noise.
Efficient pruning for Bayesian optimization: When deployed within Bayesian optimization, the conservativeness ensures that regions are only pruned if their upper confidence bound is strictly lower than the best observed lower bound, mitigating the risk of discarding the global maximizer due to spurious underestimates of variance.

Empirical simulations confirm that WS-KDE yields confidence intervals with high coverage probabilities across low- and medium-sample regimes. In pruning policies for Bayesian optimization, this prevents false negatives (i.e., the premature exclusion of promising regions) and leads to more reliable convergence—observed by the near-monotonic increase of the maximum lower confidence bound (LCB_max) as sampling progresses.

5. Methodological Workflow and Key Formulas

The methodological workflow for WS-KDE in global optimization comprises the following steps:

Data Acquisition: At each iteration, select control parameters $x$ to evaluate the stochastic function, according to an exploration-exploitation strategy driven by current mean and confidence bounds.
Local Estimation via KDE: For each $x$ , compute the kernel-weighted local mean $m_h(x)$ and effective sample size $n_h(x)$ using the available dataset.
Wilson Score Adjustment: Apply the WS-KDE adjusted mean and uncertainty formulas to yield robust confidence intervals.
Search Space Pruning: Exclude regions where the upper bound of the WS-KDE confidence interval falls below the current best lower bound.
Convergence Monitoring: Track LCB_max and related metrics to determine when the experimental process can be stopped.

The following table summarizes core formulas central to the WS-KDE estimator:

Quantity	Expression	Description
Local KDE mean	$m_h(x) = \frac{\sum_i K_i(x) y_i}{\sum_i K_i(x)}$	Weighted mean of observations
Effective sample size	$n_h(x) = \frac{n h}{\\|K\\|_2^2} \sum_i K_i(x)$	Kernel-weighted sample count
WS-KDE mean	$\hat{p}_{ws,kde}$ as above	Wilson-adjusted KDE mean
WS-KDE std. (uncertainty)	$\hat{\sigma}_{ws,kde}$ as above	Wilson-adjusted KDE uncertainty

6. Demonstrated Applications and Empirical Findings

WS-KDE has been empirically validated in both synthetic and applied domains. In benchmark studies involving analytical functions with multiple maxima, WS-KDE-based Bayesian optimization consistently achieved higher true coverage intervals than classical KDE with Gaussian bounds, especially at moderate to low sample sizes. In the context of automated trap design for vibratory part feeders, WS-KDE led to:

Reduced total number of function evaluations required for reliable optimization.
Pruning strategies that rapidly contracted the search space without excluding global optima.
Consistent guidance of the optimization process to the true global maximum, as evidenced by a 100% success rate in converging to the globally optimal design during trials, while standard KDE methods at times prematurely converged to local optima.

Remarkably, the method performed well not only under traditional binary rewards (success/failure classification) but also when generalized to non-binary performance metrics, such as the maximum frequency of occurrence of a desired outcome.

7. Implications and Scope of Use

WS-KDE provides a general, theoretically justified solution for uncertainty quantification in optimization settings characterized by expensive, noisy, and [0,1]-bounded black-box outputs. By avoiding any reliance on the specific noise model and remaining robust to sample sparsity, WS-KDE is broadly applicable across robotics, automation, simulation-based engineering, and any domain where reliable pruning of the search space is critical for optimization efficiency. A plausible implication is that the adoption of WS-KDE as the estimator in Bayesian optimization may yield systematic improvements in both the safety and data efficiency of global optimization processes for a range of high-cost experimental and simulation workloads.

PDF Markdown Chat (Pro)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Wilson Score Kernel Density Estimator (WS-KDE).