Weighting Ensemble Surrogate Predictions

Updated 26 January 2026

The paper introduces a mathematical framework for assigning non-uniform weights to surrogate models to minimize prediction error.
It details various optimization approaches including convex quadratic programming, performance-driven closed-form weights, and adaptive input-dependent schemes using gating networks.
Practical insights address challenges of non-stationarity and model diversity, improving calibration and decision quality in regression, classification, and uncertainty quantification.

A weighting strategy for ensemble surrogate model predictions refers to the mathematical, algorithmic, and implementation framework by which individual surrogate models within an ensemble are assigned non-uniform contribution coefficients (“weights”) when forming the aggregate prediction. These strategies are critical for regression, classification, design-of-experiments, uncertainty quantification, and Bayesian optimization settings, where leveraging heterogeneity or complementarity among base predictors leads to quantifiable improvements in generalization, calibration, or decision quality.

1. Mathematical Foundations and Problem Formulation

Let $\{f_1(x;\theta_1),...,f_m(x;\theta_m)\}$ denote a collection of $m$ surrogate models—potentially heterogeneous, with possibly distinct parameterizations and learning algorithms. Let $w = (w_1,...,w_m)^\top$ be their weight vector. In virtually all practical frameworks, weights are restricted to the probability simplex: $\sum_{i=1}^m w_i = 1, \quad w_i \ge 0 \ \forall i.$ The aggregated prediction at input $x$ is then

$\hat{y}(x; w, \Theta) = \sum_{i=1}^m w_i f_i(x; \theta_i),$

where $\Theta = \{\theta_1, ..., \theta_m\}$ encapsulates all hyperparameters (Shahhosseini et al., 2019).

The core weighting problem is to select $w$ (and possibly $\Theta$ ) so as to minimize a primary loss over a training set $\{(x_j,y_j)\}_{j=1}^n$ , for instance mean squared error: $m$ 0 with weights and model parameters possibly learned jointly or in a nested fashion.

2. Weight-Optimization Methodologies

Weight optimization in ensemble surrogates can be structured as follows:

a) Convex Quadratic Programming for Global MSE Minimization:

Given fixed $m$ 1, $m$ 2 can be found as the solution to a constrained quadratic program: $m$ 3 with $m$ 4 containing base predictions (Shahhosseini et al., 2019, Fokoué, 25 Dec 2025). Spectral and geometric constraints can further sharpen risk trade-offs, as in “geometric decay” or $m$ 5-ball restricted weighting (Fokoué, 25 Dec 2025).

b) Performance-Driven Closed-Form Weights:

In classical settings, weights are inversely proportional to cross-validated error. Given error metric $m$ 6 (e.g., RMSE, sMAPE) on a validation set,

$m$ 7

with $m$ 8 being common (Pawlikowski et al., 2018, Chen et al., 2022). Relative RMSE (RRMSE) scaling— $m$ 9—is robust for regression (Chen et al., 2022).

c) Objective-Specific Grid/Heuristic Search:

For primary metrics like AUC (classification), grid search or greedy refinement over the simplex is often used, fixing $w = (w_1,...,w_m)^\top$ 0 to maximize validation AUC (Hasnat et al., 3 Nov 2025).

d) Bayesian/Nested Optimization:

When hyperparameters $w = (w_1,...,w_m)^\top$ 1 significantly impact model complementarity, outer-loop Bayesian optimization can be layered around the ensemble weight optimization, proposing candidate $w = (w_1,...,w_m)^\top$ 2 configurations and fitting $w = (w_1,...,w_m)^\top$ 3 per candidate (Shahhosseini et al., 2019).

3. Adaptive, Input-Dependent, and Probabilistic Weighting Schemes

Recent advances generalize static weighting to location-dependent or distributionally adaptive strategies:

a) Gating Networks and Mixture-of-Experts (MoE):

Here, a neural gating function $w = (w_1,...,w_m)^\top$ 4 parametrizes weights via $w = (w_1,...,w_m)^\top$ 5, enabling $w = (w_1,...,w_m)^\top$ 6 that adapt spatially or contextually. Training objectives include a prediction loss and entropy regularization to prevent “expert collapse”—the gating network degenerating to select a single model everywhere (Nabian et al., 28 Aug 2025).

b) Dependent Tail-Free Processes (DTFP):

Input-dependent random probability measures are constructed via stick-breaking processes with logistic–GP–parametrized “sticks”. Variational inference is used to match both data likelihood and calibration (CRPS), yielding highly localized and uncertainty-aware weighting (Liu et al., 2018).

c) Reinforcement Learning (RL)-Based Dynamic Weighting:

Weights $w = (w_1,...,w_m)^\top$ 7 are treated as continuous actions, updated online through reward feedback (e.g., error decrease), with SARSA or actor-critic algorithms dynamically modulating weights in response to realized forecasting performance (Perepu et al., 2020).

d) Online Error Aggregation:

Exponentially weighted moving averages of model-specific errors furnish time-varying weights via $w = (w_1,...,w_m)^\top$ 8; these can be updated recursively and normalized (Sen et al., 18 Jan 2025).

4. Surrogate-Accelerated and Bayesian Posterior Weight Learning

Bayesian strategies are increasingly prominent when integrating expensive and high-dimensional physical-model surrogates:

Blended Parameterization and Bayesian Posterior Weights:

Physical schemes are mixed continuously via weights $w = (w_1,...,w_m)^\top$ 9 in the model equations; posterior distributions over $\sum_{i=1}^m w_i = 1, \quad w_i \ge 0 \ \forall i.$ 0 are inferred using surrogate-accelerated likelihood computations (e.g., Gaussian-process surrogates for log-likelihood), with MCMC sampling yielding ensembles reflecting joint model–data uncertainty (Mai et al., 18 Jun 2025).

Bayesian Optimization with Ensemble Surrogates and Regularized Weights:

In transfer-learning BO, surrogate GPs from prior tasks and the target task are linearly combined, with $\sum_{i=1}^m w_i = 1, \quad w_i \ge 0 \ \forall i.$ 1 obtained by non-negative regularized regression (Ridge/Lasso with $\sum_{i=1}^m w_i = 1, \quad w_i \ge 0 \ \forall i.$ 2), refit at each BO iteration (Trinkle et al., 22 Jan 2026).

5. Model Selection, Diversity, and Validation in Weighting Strategy Design

Weighting efficacy is highly dependent on the setup of the ensemble pool:

Diverse Pool Pruning and Ensemble Construction:

Heuristics for pool construction include discarding models above average error and greedily selecting models with both low validation loss and low pairwise correlation to maximize complementarity (Shahhosseini et al., 2019).

Clustered Approaches and Two-Stage Aggregation:

Clustering by latent data regimes (e.g., time series with/without seasonality/trend) with model pools and locally tuned weights can substantially outperform uniform or global weighting (Pawlikowski et al., 2018, Cui et al., 27 Dec 2025). Two-step convex aggregation (Random Subset Averaging) provides an avenue for stable estimation in high dimensions (Cui et al., 27 Dec 2025).

Self-Validation for Small Samples:

Fractional random-weight bootstrap methods assign anti-correlated “train”/“validate” weights to individual runs (rather than resampling), iterating model selection/fitting and averaging coefficients for self-checked ensemble surrogates (Lemkus et al., 2021).

6. Specialization to Task Objectives and Unsupervised Settings

Objective-Specific Weighting:

For decision-critical applications (such as power systems operation), weights can be optimized to minimize problem-driven prediction loss—a task-specific optimality gap reflecting the effect of prediction error on final economic or operational objective, with a surrogate model trained to approximate the mapping from weights to loss and analytic optimization on the simplex (Zhuang et al., 14 Mar 2025).

Unsupervised Weight Estimation:

SUMMA demonstrates unsupervised estimation of weights from the rank-covariance structure of predictions, inferring “informativeness” of each method via a spectral approach, even in the absence of ground-truth labels. This allows principled weight assignment in data-scarce or privacy-limited domains (Ahsen et al., 2018).

7. Empirical Performance, Theoretical Guarantees, and Limitations

Empirical Gains:

Across diverse benchmarks, data-driven weighting outperforms uniform schemes—yielding RMSE, AUC, or decision-loss reductions, especially in regimes where base learners are heterogeneous or the performance landscape is non-stationary (Shahhosseini et al., 2019, Sen et al., 18 Jan 2025, Hasnat et al., 3 Nov 2025, Chen et al., 2022, Pawlikowski et al., 2018, Nabian et al., 28 Aug 2025).

Theoretical Conditions:

Structured weighting (enforcing geometric or spectral constraints) is proven to outperform uniform weights whenever it achieves strictly lower approximation error without increasing ensemble variance (Fokoué, 25 Dec 2025). Two-stage aggregators such as RSA are asymptotically optimal under broad conditions and substantially improve finite-sample risk (Cui et al., 27 Dec 2025).

Limitations:

Global weighting schemes can underperform in the presence of strong data non-stationarity or localized accuracy differences among surrogates—necessitating local, adaptive, or input-dependent strategies. Dependence or collinearity among base models limits the additive benefit of weighting. Strategy-specific requirements—e.g., validation data for error-based weights or independence assumptions for unsupervised estimation—must be carefully managed for each application.

References:

“Optimizing Ensemble Weights and Hyperparameters of Machine Learning Models for Regression Problems” (Shahhosseini et al., 2019)
“A General Weighting Theory for Ensemble Learning: Beyond Variance Reduction via Spectral and Geometric Structure” (Fokoué, 25 Dec 2025)
“A Mixture of Experts Gating Network for Enhanced Surrogate Modeling in External Aerodynamics” (Nabian et al., 28 Aug 2025)
“RRMSE Voting Regressor: A weighting function based improvement to ensemble regression” (Chen et al., 2022)
“Weighted Ensemble of Statistical Models” (Pawlikowski et al., 2018)
“Adaptive and Calibrated Ensemble Learning with Dependent Tail-free Process” (Liu et al., 2018)
“Random Subset Averaging” (Cui et al., 27 Dec 2025)
“Self-Validated Ensemble Models for Design of Experiments” (Lemkus et al., 2021)
“Blackbox Attacks via Surrogate Ensemble Search” (Cai et al., 2022)
“A Weighted Predict-and-Optimize Framework for Power System Operation Considering Varying Impacts of Uncertainty” (Zhuang et al., 14 Mar 2025)
“QGAPHEnsemble : Combining Hybrid QLSTM Network Ensemble via Adaptive Weighting for Short Term Weather Forecasting” (Sen et al., 18 Jan 2025)
“Reinforcement Learning based dynamic weighing of Ensemble Models for Time Series Forecasting” (Perepu et al., 2020)
“An Empirical Study on Ensemble-Based Transfer Learning Bayesian Optimisation with Mixed Variable Types” (Trinkle et al., 22 Jan 2026)