Evaluation of Bagging Predictors with Kernel Density Estimation and Bagging Score

Published 4 Apr 2026 in cs.LG | (2604.03599v1)

Abstract: For a larger set of predictions of several differently trained machine learning models, known as bagging predictors, the mean of all predictions is taken by default. Nevertheless, this proceeding can deviate from the actual ground truth in certain parameter regions. An approach is presented to determine a representative y_BS from such a set of predictions using Kernel Density Estimation (KDE) in nonlinear regression with Neural Networks (NN) which simultaneously provides an associated quality criterion beta_BS, called Bagging Score (BS), that reflects the confidence of the obtained ensemble prediction. It is shown that working with the new approach better predictions can be made than working with the common use of mean or median. In addition to this, the used method is contrasted to several approaches of nonlinear regression from the literatur, resulting in a top ranking in each of the calculated error values without using any optimization or feature selection technique.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces a KDE-based method that computes a representative prediction and confidence measure (Bagging Score) for nonlinear regression using ensemble outputs.
It benchmarks the proposed approach on the Concrete dataset, achieving superior performance metrics (R², RMSE, MAPE, MAE) compared to traditional mean and median aggregations.
The method enhances robustness and interpretability by automatically adapting to multimodal and skewed distributions in ensemble predictions.

Evaluation of Bagging Predictors with Kernel Density Estimation and Bagging Score

Overview

This paper introduces a principled approach for aggregating predictions from bagging ensembles, particularly in nonlinear regression with neural networks, by leveraging Kernel Density Estimation (KDE) to produce a representative prediction value ($\Tilde{y}_{BS}$) together with a confidence measure termed the Bagging Score (BS). The method is evaluated against standard ensemble aggregation methods—mean and median—and is benchmarked on the canonical Concrete dataset. Results demonstrate consistently improved predictive accuracy and enhanced interpretability of ensemble outputs, especially in the presence of asymmetry and multi-modality in the prediction distribution.

Bagging Ensemble Aggregation: Challenges and Motivation

Conventional bagging predictors aggregate the outputs of independently trained neural networks via simple statistical measures, predominantly the mean or median of ensemble predictions. While effective under conditions of symmetric, unimodal distributions, these aggregation techniques exhibit significant limitations when the predictive ensemble's output distribution is skewed or multimodal—a scenario often encountered in stress regions of the input domain or extrapolative settings.

Notably, the mean is particularly susceptible to distortion under heavy-tailed or asymmetric distributions, and neither mean nor median can accommodate the presence of multiple predictive modes. These limitations lead to increased bias in critical parameter regions, directly impacting regression task performance.

Methodology: Kernel Density Estimation and the Bagging Score

The proposed approach circumvents the limitations of mean/median aggregation by applying KDE to the set of ensemble predictions $\Tilde{\mathcal{Y}}_x$ for each input $x$ . The method estimates the empirical predictive density as follows:

For each $x$ , collect ensemble predictions $\Tilde{\mathcal{Y}}_x$ from $n_c$ trained networks (with $n_c = 1000$ in this study).
Construct a KDE using a Gaussian kernel with bandwidth heuristically set to cover the central 99.7% confidence interval, corresponding to $h_k = \sigma_{\Tilde{\mathcal{Y}}}/6$.
The Bagging Score $\beta_{BS}$ is defined as the maximum value of the estimated density. The corresponding $x$ -value ($\Tilde{\mathcal{Y}}_x$0) at this maximum is interpreted as the ensemble's most representative prediction.

This approach enables automatic identification of majority (modal) predictions in complex, potentially multimodal distributions and yields a scalar confidence value $\Tilde{\mathcal{Y}}_x$1 in $\Tilde{\mathcal{Y}}_x$2, directly reflecting empirical consensus.

Comparative Performance Analysis

The empirical study is conducted on the Concrete Compressive Strength dataset, a highly nonlinear regression benchmark with 1030 samples and complex feature interactions. The evaluation compares the Bagging Score ($\Tilde{\mathcal{Y}}_x$3 aggregation), mean, and median across several error metrics: $\Tilde{\mathcal{Y}}_x$4, RMSE, MAPE, and MAE.

Key findings:

$\Tilde{\mathcal{Y}}_x$5 achieves superior performance across all metrics: $\Tilde{\mathcal{Y}}_x$6, RMSE = 4.52, MAPE = 10.8, MAE = 3.30.
The median slightly outperforms the mean, but lags behind BS (median: $\Tilde{\mathcal{Y}}_x$7, RMSE = 4.89, MAPE = 12.6, MAE = 3.66).
The mean produces the weakest results ($\Tilde{\mathcal{Y}}_x$8, RMSE = 5.64, MAPE = 16.6, MAE = 4.49).

When benchmarked against canonical literature models—including optimized XGBoost, random forest, and deep ensembles—the combination of the proposed method with alternating transfer function networks places among the top solutions that do not involve sophisticated optimization or feature selection.

Theoretical and Practical Implications

The KDE-based aggregation with Bagging Score confers several advantages:

Robustness to Non-Gaussian Predictive Distributions: The method adapts seamlessly to multi-modal and skewed predictive output landscapes, a common scenario in real-world regression tasks involving high model variance or nonstationary data regimes.
Quantitative Confidence Assessment: The Bagging Score offers a probabilistic quality measure for each ensemble prediction. Unlike variance-based uncertainty estimates, BS is inherently responsive to the presence of consensus versus ambiguity in ensemble predictions, and correlates strongly with empirical prediction error.
Modularity and Compatibility: The approach is an algorithmically lightweight post-processing step, applicable to arbitrarily constructed neural ensembles, independent of architectural or training protocol details.

Potential Extensions

Future research directions include:

Error Calibration: The demonstrated correlation between BS and prediction error opens opportunities for calibrated region-wise uncertainty estimation, potentially improving trustworthiness in safety-critical applications.
Sparse Data Regimes: Preliminary results indicate stable performance even under severe data sparsity, suggesting utility in low-sample domains.
Classification Tasks: While the study addresses regression, the extension of the Bagging Score framework to classification—with KDE applied to real-valued class probability predictions or to latent representations—warrants systematic exploration.
Integration with Hyperparameter Optimization: Combining BS-based aggregation with automated model selection or architecture search (potentially using cross-validation or out-of-bag assessment) could yield further accuracy improvements.

Conclusion

The integration of Kernel Density Estimation for ensemble aggregation and the introduction of the Bagging Score yield more robust, interpretable, and accurate predictions in bagged neural regression ensembles. The method effectively mitigates the shortcomings of traditional mean and median aggregation under non-ideal predictive output distributions. Broader application of this methodology, in tandem with uncertainty-aware modeling techniques and optimization strategies, is expected to enhance the reliability of ensemble-based machine learning pipelines in both research and deployment settings (2604.03599).

Markdown Report Issue