Aggregation of Experts (AoE)

Updated 7 April 2026

Aggregation of Experts (AoE) is a framework that combines predictions from diverse human and computational experts using rigorous mathematical and statistical methods.
It employs weighted-averaging, logarithmic pooling, and sequential update algorithms to address challenges like bias, dependence, and adaptivity in forecasts.
The framework underpins applications in forecasting, machine learning ensembles, and social choice, offering robust performance measures and minimax regret guarantees.

The Aggregation of Experts (AoE) refers to a broad class of mathematical, algorithmic, and statistical frameworks by which predictions, parameter estimates, preferences, or decisions from multiple "experts"—which may be either human judges, computational models, or modules of a larger system—are combined into a single collective output. AoE addresses challenges inherent to reconciling diverse sources, such as accounting for expertise, redundancy, bias, dependence, adaptivity, dynamic entry/exit, and decision-theoretic coherence. The field spans probabilistic forecast combination, sequential prediction, voting and social choice, model and belief fusion, and meta-learning in machine learning ensembles.

1. Mathematical Foundations and Representational Theorems

A principled definition of AoE rests on functional-analytic and decision-theoretic formalisms. Let $X$ be a set of experts or models, with each $x \in X$ yielding an output $f(x)$ in an outcome space $H$ (typically $\mathbb{R}^d$ or a probability simplex $\Delta(\Omega)$ ). An aggregation rule is a map $f: X^* \rightarrow H$ , assigning to any finite ensemble $A \subset X$ an aggregated output $f(A)$ with $f(\{x\}) = f(x)$ .

A central axiom is weighted-averaging (WA): for disjoint $x \in X$ 0, there exists $x \in X$ 1 with $x \in X$ 2. Under minimal richness (outputs not collinear), strict WA characterizes all pure weighted-averaging rules

$x \in X$ 3

for a unique (up to scaling) positive weight function $x \in X$ 4 (Bajgiran et al., 2021). Weakening strictness introduces ensemble rankings: only top-ranked experts in the ensemble according to a weak order are included, and then weighted. This unifies weighted averaging, dictatorship, ranking+weight rules, and covers aggregation in Bayesian updating, social choice, and belief formation.

2. Statistical and Probabilistic Aggregation Methodologies

AoE in statistical settings centers on the aggregation of predictive distributions or probability vectors. The canonical linear opinion pool forms the average

$x \in X$ 5

for individual predictive distributions $x \in X$ 6 and non-negative weights summing to unity; this includes equal-weight averaging and performance-based weighting (McAndrew et al., 2019). The logarithmic opinion pool (LogOps) aggregates via

$x \in X$ 7

which is externally Bayesian and confers robustness against adversarial experts (a single zero-value yields $x \in X$ 8) (Kahn, 2012).

A generative Bayesian model accounts for event-specific priors, expert calibration, bias, dependence, and relative accuracy. Under Gaussian log-odds, the aggregated forecast is a LogOps with analytic weights: $x \in X$ 9 where $f(x)$ 0 is the vector of debiased expert log-odds and $f(x)$ 1 reflects calibration, pairwise dependence, and relative information (Kahn, 2012). Independence or exchangeability among experts yields simple closed-form weight expressions.

In practical and review contexts, a range of variants is noted: spread- or beta-transformed pools (to address overdispersion), quantile/median combination, stacking and super learner for meta-learning ensembles, and Bayesian model averaging where expert models are weighted by posterior model probabilities (McAndrew et al., 2019).

3. Online and Sequential Aggregation: Regret and Adaptivity

In sequential prediction, AoE frameworks seek to minimize (shifting) regret relative to arbitrary, potentially nonstationary benchmarks. Exponentially-weighted aggregation (e.g., Hedge, AdaHedge) assigns time-varying weights $f(x)$ 2 to experts, adapting based on observed loss sequences. The learner's loss at time $f(x)$ 3 is $f(x)$ 4. Fixed-Share meta-algorithms allow weights to shift between experts, controlling for regime changes (V'yugin et al., 2018, Devaine et al., 2012).

For adversarial, unbounded losses, AdaHedge with adaptive learning rates and Fixed-Share mixing guarantees regret $f(x)$ 5 for up to $f(x)$ 6 switches (V'yugin et al., 2018). The addition of "confidence" weights enables smooth expert specialization, allowing experts to participate fractionally according to $f(x)$ 7, with regret bounds preserved.

The specialist aggregation rule and its fixed-share variant have been rigorously analyzed in the context of sequential electricity forecasting, where only subsets of experts are active at each time, with formal regret bounds and empirical improvements in RMSE versus convex combination or best expert (Devaine et al., 2012).

4. Robust and Theoretically-Optimal Aggregation Strategies

Non-asymptotic minimax optimality is achievable under suitable information conditions. If expert forecasts are derived from signals that constitute "projective substitutes"—i.e., signals exhibit diminishing marginal informativeness—the simple average achieves an improvement factor $f(x)$ 8 over the prior in the worst case for $f(x)$ 9 experts, substantially outperforming the random-expert baseline. Extremizing (moving the average away from the prior by a calibrated factor) further increases the lower bound (to $H$ 0) (Neyman et al., 2021). These results explain and theoretically ground the empirical practice of extremizing averages in domains such as geopolitical forecasting.

In environments with partial evidence structures, where experts observe overlapping subsets of independent signals, it is possible to learn the Bayes-optimal aggregation rule in polynomial time if and only if the signal-expert incidence matrix is full rank (Babichenko et al., 2018). Aggregators operate by online convex optimization over logit-transforms of expert probabilities, yielding regret $H$ 1 with respect to the full-information optimum, provided injectivity.

5. Specialized, Consensus, and Feedback-Driven AoE Procedures

AoE encompasses specialized methodologies tailored for consensus-building, group decision support, and incomplete, noisy, or heterogeneous expert information:

Consensus Linear Opinion Pool: Iteratively updates each expert’s belief as a linear pool, with weight inversely proportional to distance from other experts, converging to a unique consensus under quadratic scoring rules (Carvalho et al., 2012).
Combinatorial Aggregation with Feedback: For small groups with incomplete pairwise comparison data, every judgment is weighted by scale detail and expert competence, reconciling spanning trees of pairwise matrices into ideally-consistent consensus via geometric means, feedback, and entropy-based agreement indices (Tsyganok et al., 2017).
Naive Aggregation with Validation-Driven Weights: Maintains multiple noisy stochastic gradient trajectories (each on a bootstrap subset of data), reweighting experts by a validation-based risk measure via multiplicative weights. The convex mixture converges to a consensus parameter with guaranteed generalization bounds that track the best constituent (Befekadu, 2024).

6. Applications, Empirical Performance, and Open Challenges

AoE methodology is foundational across domains:

Forecast Aggregation: Judgmental and statistical forecasting in economics, epidemiology, sports, engineering risk, and climate science (McAndrew et al., 2019).
Machine Learning Ensembles: Stacking, meta-learning, and BMA for model selection and combination.
Social Choice and Welfare: Utilitarian aggregation of vNM utilities, extended Pareto rules, and case-based prediction (Bajgiran et al., 2021).
Decision Support Systems: Strategic planning in weakly structured domains via feedback-driven group AoE (Tsyganok et al., 2017).

Key performance desiderata include calibration, accuracy (often versus proper scoring rules: Brier, log-likelihood), sharpness, coherence, and robustness. Empirical studies frequently find equal weighting surprisingly hard to improve upon unless strong dependence, asymmetry, or performance history justifies differential weighting (McAndrew et al., 2019, Kahn, 2012). Practicalities such as computational scaling, model dependence, missingness, and dynamic expert sets present ongoing algorithmic and theoretical challenges.

7. Theoretical and Practical Frontiers

The AoE field continues to probe:

Axiomatic characterizations: Further unification of aggregation rules for calibration, coherence, and robustness, especially under conflicting desiderata (Bajgiran et al., 2021, McAndrew et al., 2019).
Dependence modeling: Hierarchical and copula-based methods for correlated or redundant experts.
Extreme or adversarial settings: Minimax regret for online and adaptive aggregation under shifting or adversarial settings; adversarial noise and model misspecification (V'yugin et al., 2018).
Hybrid and dynamic environments: Crowdsourcing mixed with domain experts, evolving model spaces, and automated meta-learners for dynamic, high-stakes domains (McAndrew et al., 2019).
Empirical validation standards: Systematic out-of-sample evaluation, improved benchmarking, and standardization of metrics and terminology (McAndrew et al., 2019).

Future theoretical and practical progress in AoE will be anchored by continued cross-fertilization among probability, information theory, learning theory, social choice, and application-driven statistical practice.