Weighted Iterative Society-of-Experts

Updated 9 December 2025

WISE is a family of iterative algorithms that aggregates expert opinions using weighted pooling to achieve provable convergence and consensus.
It employs inverse-RMS weighting and distributed min-rule updates, delivering improved accuracy (e.g., 69.29% in NFL forecasting) and robust learning in heterogeneous settings.
WISE has been extended to multimodal debates among LLMs and classifier networks, achieving 2–7 percentage point gains across benchmarks such as SMART-840 and VisualPuzzles.

The Weighted Iterative Society-of-Experts (WISE) is a family of principled, iterative algorithms for multi-agent aggregation of expert opinions or solutions. Originating as a consensus method for probabilistic belief pooling in the "consensual linear opinion pool" model, WISE methodologies have been adapted for robust multimodal multi-agent debate, distributed learning with partially informative agents, and crowdsourcing under adversarial or heterogeneous settings. The unifying feature is iterative pooling of agent outputs weighted by reliability metrics—ranging from inverse-distance on probability vectors to structured credit assignment across debate participants—leading to provably convergent and incentive-compatible consensus mechanisms. WISE encompasses decentralized opinion pools, multi-round LLM debates over multimodal inputs, and distributed belief propagation in heterogeneous networks of classifiers.

1. The Consensual Linear Opinion Pool Model

The original WISE method conceptualizes a society of $n$ experts each providing a probability vector $x_i \in \Delta_z$ over $z$ mutually exclusive outcomes. At each iteration $t$ , expert $i$ updates their opinion by linearly pooling the opinions of all experts, weighted inversely by root-mean-square (RMS) deviation from their own:

$w_{ij}^{(t)} = \alpha_i^{(t)} \frac{1}{\varepsilon + D(x_i^{(t)}, x_j^{(t)})}, \quad D(x, y) = \sqrt{\frac{1}{z} \sum_{k=1}^z (x_k - y_k)^2}$

where $\varepsilon>0$ prevents division by zero and $\alpha_i^{(t)}$ normalizes the weights. The update rule is

$x_i^{(t+1)} = \sum_{j=1}^n w_{ij}^{(t)} x_j^{(t)}$

and in matrix notation $X^{(t+1)} = W^{(t)} X^{(t)}$ . This process is distributed (computable locally) and provably drives all experts toward consensus, so that $X^{(t)} \to x^* \mathbf{1}$ as $t \to \infty$ for some vector $x^*$ , regardless of initialization. The incentive-compatibility of using inverse-RMS weights is rooted in the quadratic scoring rule, under which a rational expert maximizes expected reward by selecting or emulating the closest peer opinion in RMS distance. Empirical studies on NFL probability forecasting demonstrate significant gains in aggregate accuracy and robust mitigation of outlier bias compared to naïve averaging and other distance-based pools (Carvalho et al., 2012).

2. Distributed Learning with Partially Informative Agents

WISE-inspired protocols have been generalized to classification networks of heterogeneous, partially informative agents. Each agent’s local classifier provides posteriors $p_i(\theta|x)$ for a subset $\Theta_i$ of the $m$ true classes. A two-step iterative protocol is used: first, private local posteriors are recursively updated via a form of sequential (recursive) Bayes adjustment, treating uncertain classes via a max-propagation trick; second, global beliefs are updated using a distributed min-rule, whereby each agent’s global belief on class $\theta$ is replaced by the minimum over its local and all neighbors’ current global beliefs, then normalized. The resulting distributed algorithm achieves almost-sure learning of the true class under weak identifiability and connectivity assumptions; the rate of convergence is determined by the strongest agent's discrimination or support score, independent of peer graph size (Yao et al., 30 Sep 2024).

3. Modular Multi-Agent Debate in Multimodal Reasoning (WISE-MAD)

WISE has been extended to support robust reasoning in heterogeneous societies of LLMs and multimodal LLMs (MLLMs) for vision-and-language tasks (Cherian et al., 2 Dec 2025). The paradigm partitions experts into Solvers (who generate step-by-step solutions for each round) and Reflectors (who judge, assign integer weights, and supply targeted natural language critique), coordinated by an Orchestrator LLM. The debate protocol is staged over $K_{\max}$ rounds: each round, Solvers output answers and reasoning chains, Reflectors evaluate and assign weights $w_{ij}^k \in \{-1, 0, 1, 2\}$ along with explanations $f_{ij}^k$ , and the Orchestrator packages feedback into revised prompts for Solvers. If all Solvers and Reflectors unanimously agree on correctness, the process terminates early; otherwise, feedback is synthesized and targeted to drive convergence.

For aggregation, the WISE-DS algorithm adapts the Dawid–Skene EM procedure to model confusion matrices for both Solvers (solution reliability) and Reflectors (judgement reliability). Latent true answer and label variables for each debated item are inferred, and the fused answer is reweighted by recent performance.

4. Theoretical Principles: Convergence and Incentive Compatibility

In the original WISE formulation, convergence is ensured by the strong ergodicity of the stochastic matrix sequence $\{W^{(t)}\}$ : every $W^{(t)}$ has strictly positive entries due to $\varepsilon>0$ , guaranteeing that the induced opinion Markov chain is strongly connected at each step. This structure ensures monotonic contraction of the opinion diameter: $\delta(X^{(t)}) \le (1 - \gamma(W^{(t)})) \delta(X^{(t-1)}), \qquad \delta(\cdot), \gamma(\cdot)\ \text{per Paz (1971)}$ where $\delta$ measures maximal row spread. Iterates thus converge in $L_\infty$ to consensus.

Incentive compatibility under the quadratic scoring rule ensures honest reporting of Bayesian posteriors: the expert’s expected score is maximized when reporting the true belief, and among peers’ opinions, closest RMS is strictly preferred (Carvalho et al., 2012).

In the distributed classification setting, the min-rule update, together with the Bayes step, admits rigorous convergence guarantees: almost-sure learning of the true class and exponential rejection rate for false classes, determined by the maximal source agent discrimination or support agent confusion scores. Global identifiability and connectivity are required for guarantee (Yao et al., 30 Sep 2024).

5. Empirical Performance and Benchmarking

Extensive empirical evaluation has validated WISE’s robust aggregation capability:

In probability forecasting, WISE (ε=10⁻⁴) outperformed both the Barlow-McMahan-Sorensen (BMS) method and simple averaging, achieving 69.29% accuracy on 267 NFL games, significantly higher than alternatives, with lower mean absolute error. All improvements were statistically significant (p<10⁻⁴, Wilcoxon test) (Carvalho et al., 2012).
In distributed classification over CIFAR-10, WISE-style min-rule aggregation enabled networks of weak, partially informative agents to achieve rapid consensus on the true class (≈10 steps to reach μ(θ*) > 0.9), whereas naïve local or average-based methods failed in the “myopic” regime (Yao et al., 30 Sep 2024).
For multimodal multi-agent debate, WISE delivered 2–7 percentage point improvements across SMART-840, VisualPuzzles, EvoChart-QA, and a newly created SMART-840++ dataset. On SMART-840, accuracy increased from 60.6% (best single model) and 62.0% (prior MAD) to 68.1% with WISE; similar gains were obtained for VisualPuzzles (+2.8 pp) and EvoChart-QA (from 39.1% to 75.4%) (Cherian et al., 2 Dec 2025).

Benchmark	Best Single (%)	Prior MAD (%)	WISE (%)
SMART-840	60.6	62.0	68.1
VisualPuzzles	57.0	—	59.8
EvoChart-QA	39.1	—	75.4

Ablations confirm that mixed strong/weak agent configurations retain benefits, that role partitioning (Solvers, Reflectors, Orchestrator) is essential, and that WISE-DS aggregation delivers 2–4 percentage point gains over majoritarian or naïve Dawid–Skene aggregation (Cherian et al., 2 Dec 2025).

6. Algorithmic Complexity, Limitations, and Extensions

The original and min-rule variants of WISE are computationally lightweight: each agent’s update per iteration is $O(nz)$ (consensual pool) or $O(m|\mathcal{N}_i|)$ (min-rule on $m$ classes, $d_i$ neighbors), with $O(m)$ communication cost per step.

Key limitations identified include the reliance on accurate belief/probability estimation (especially for partial experts), potential conservatism and reduced transient speed in densely connected min-rule networks, and global identifiability requirements for consensus. Extensions of WISE protocols include time-varying and directed network communication, Byzantine-resilient aggregation, semi-supervised reward shaping, and domain partitioning for multi-modal input pathways (Cherian et al., 2 Dec 2025, Yao et al., 30 Sep 2024).

7. Applications and Impact

WISE methodologies have established a broad impact in ensemble forecasting, crowdsourcing, large-scale LLM system fusion, multi-view sensor integration, and distributed learning. Their consensus and robustness properties provide state-of-the-art aggregation in the presence of model heterogeneity, adversarial outliers, and partial information. The recent formulation of WISE for multimodal LLM debate illustrates continued relevance for artificial intelligence research, specifically in orchestrating specialized agents or modules for complex, multi-modal reasoning tasks. Quantitative improvements across several vision-language benchmarks reinforce the value of weighted iterative aggregation under principled EM-calibrated post-processing (Cherian et al., 2 Dec 2025).