Multi-Agent Undercover Gaming Protocol

Updated 21 November 2025

MUG Protocol is a framework for multi-agent debate that formalizes agent interactions using dynamic role allocation, selective debate triggering, and sparse communication to enhance reasoning and efficiency.
It integrates advanced mechanisms such as the Truth Last strategy, the MADC algorithm, and the iMAD classifier to optimize debate order, token usage, and overall accuracy.
The protocol addresses practical challenges like error propagation, conformity bias, and computational overhead while paving the way for adaptive, multimodal, and robust multi-agent inference systems.

Multi-Agent Undercover Gaming (MUG) Protocol

The Multi-Agent Undercover Gaming (MUG) Protocol encompasses recent advances in multi-agent debate (MAD) frameworks for enhancing the reasoning, robustness, and efficiency of LLM inference. MUG formalizes agent interactions as structured, role-driven protocols that leverage diverse viewpoints, dynamic role allocation, sparsified communication, automated consistency estimation, and selective debate triggering.

1. Principles of Agent Role Allocation and Positional Power

Role allocation within MAD protocols determines agent speaking order and power over consensus outcomes. "Key Decision-Makers in Multi-Agent Debates: Who Holds the Power?" introduces the "Truth Last" strategy, demonstrating that placing the agent whose initial chain-of-thought matches the true answer as the final speaker maximizes overall debate accuracy (Zhang et al., 14 Nov 2025). Formally, let agents $A_1,\ldots,A_n$ produce initial CoT answers $V_{i,0}$ ; select a permutation $\sigma$ such that $\sigma(n)\in T$ (indices of truthful agents), giving:

$\max_{\sigma}\Pr(\text{Consensus}(\{V_{i,m}\}) = \text{TrueAnswer})\quad\text{s.t. }\sigma(n)\in T$

Empirical studies reveal up to 22% performance gains over random role allocation, with position bias allowing truth-seeking agents to steer debate consensus effectively.

2. Consistency-Driven Debate Ordering: MADC Algorithm

Ground-truth for agent roles is generally unknown in practice. The Multi-Agent Debate Consistency (MADC) algorithm operationalizes Truth Last without oracle access (Zhang et al., 14 Nov 2025). At each round, agents' debate paths $P_i=(V_{i,1},...,V_{i,m})$ are evaluated for path consistency:

$\mathrm{Consistency}(P_{i,j}) = \sum_{k\neq i}\mathbf{1}[V_{i,j}=V_{k,j}]$

The agent with maximum consistency (agreement with peers) is placed last in the debate order; final consensus is assigned to the answer with highest aggregate agreement over all rounds. The procedure optimizes debate workflow by simulating ideal positional influence:

for j in 1...m-1:
    Φ = [Consistency(P_{i,j}) for i in range(n)]
    i_star = argmax(Φ)  # Most consistent agent
    order = argsort(Φ excluding i_star) + [i_star]
    debate in order, update paths
return answer with max-consistency

3. Selective Debate Triggering and Token Efficiency: iMAD

While full MAD debates deliver improvements, indiscriminate triggering incurs large computational expense and can harm accuracy if agents overturn correct initial answers (Fan et al., 14 Nov 2025). iMAD introduces a debate-decision classifier that utilizes 41 interpretable features extracted from a single-agent self-critique (syntactic depth, contrast markers, hedge words, etc.), producing a calibrated debate-skip score $p$ via FocusCal loss:

$\mathcal{L}_{\mathrm{FocusCal}(y,p,u)} = L_{\mathrm{AF}(y,p)} + \lambda\,L_{\mathrm{CP}(y,p,u)} + \mu\, \mathrm{ECE}(\{y_i,p_i\})$

Debate is triggered only when internal hesitation cues suggest likely correction of a wrong answer. Empirical validation demonstrates up to $92\%$ token savings and $13.5\%$ accuracy gains relative to full MAD.

4. Sparse Communication and Dynamic Trust Graphs

Standard all-to-all agent communication in MAD rapidly inflates prompt and token consumption, often obscuring salient arguments and amplifying overconfident agents. CortexDebate replaces a dense topology with a sparse, dynamically-pruned debate graph governed by McKinsey-based Debate Matter (MDM) weights (Sun et al., 5 Jul 2025):

$W_{i\to j}^d = \frac{C_d \cdot R_d \cdot I_d}{S_d}$

with factors for credibility, reliability, intimacy (cosine-similarity between outputs), and self-orientation. Only edges above per-agent average weight are retained; agents debate only with trusted, non-overlapping peers. This design reduces prompt size by up to $70.8\%$ , increases diversity of correct revisions, and mitigates error propagation from dominant overconfident agents.

5. Consensus Models, Anti-Conformity, and Fairness Mechanisms

The majority-voting consensus, common in classic MAD (e.g., Du et al., Multi-Agent Debate, 2023), can suffer from error propagation and randomness. Free-MAD abolishes round-wise consensus and majority voting, replacing them with a score-based mechanism that tracks changes and justifications in each agent’s answers (Cui et al., 14 Sep 2025):

$S(a) = \sum_{i=1}^N\left[w_1 f_0 \mathbf{1}(r_i^0 = a) + \sum_{k=1}^1\{ -w_2 f_k \mathbf{1}(r_i^{k-1}=a\neq r_i^k) + w_3 f_k \mathbf{1}(r_i^k = a \neq r_i^{k-1}) + w_4 f_k \mathbf{1}(r_i^k = a = r_i^{k-1}) \}\right]$

Agents are guided to only revise their answer upon justified error detection in peer reasoning, explicitly mitigating blind conformity. This single-round, anti-conformity protocol improves accuracy and robustness while halving token overhead.

6. Underlying Game-Theoretic and Bayesian Structures

MUG protocols increasingly represent debate as weighted Bayesian or game-theoretic update processes. Competitive MAD models updates as a zero-sum game aiming for win-maximization, often degenerating to debate hacking and no net information gain (Chen et al., 23 Oct 2025). Collaborative MAD (ColMAD) reframes interaction with non-zero-sum utilities rewarding error coverage, agreement, and evidence-grounded accuracy. Theoretical analysis demonstrates stricter reduction of Bayes risk in cooperative equilibria when debate adds information over baseline model outputs.

Identity bias—sycophancy (over-weighting peer views) and self-bias—further corrupts MAD. Response anonymization equalizes self and peer weights, measured via the Identity Bias Coefficient (IBC), restoring belief-driven inference (Choi et al., 8 Oct 2025).

7. Limitations and Future Directions

MUG Protocols face persistent bottlenecks including token cost, difficulty estimating true agent reliability, error propagation in conformity-centric frameworks, and task-specific optimality variances between debate and majority voting (Choi et al., 24 Aug 2025, Zhang et al., 12 Feb 2025). Role allocation and agent diversity (heterogeneous MAD) are critical scaling knobs; empirical evidence supports up to $22\%$ gains through optimal positional ordering and 3–6\% via model heterogeneity (Zhang et al., 14 Nov 2025, Zhang et al., 12 Feb 2025). However, collaborative refinement can amplify both correctness and vulnerability depending on initial agent dispersion and safety alignment (2505.22960).

Research directions emphasized include dynamic, RL-driven role and topology scheduling, adaptive sparsification, hierarchical debate, robust value alignment via vigilance and interval communication (GVIC) (Zou et al., 2024), and extension of content-driven debate to open-ended reasoning and multimodal settings.

Table: Core Innovations in Recent MUG/MAD Protocols

Protocol/Concept	Core Mechanism	Key Reported Gains
Truth Last (Zhang et al., 14 Nov 2025)	Oracle role-order, true answer last	+22% accuracy
MADC (Zhang et al., 14 Nov 2025)	Consistency-driven ordering	+1–10% accuracy, robust scaling
iMAD (Fan et al., 14 Nov 2025)	Feature-driven selective triggering	−92% tokens, +13.5% accuracy
CortexDebate (Sun et al., 5 Jul 2025)	Sparse trust-graph (MDM)	−70.8% tokens, +5–8% accuracy
Free-MAD (Cui et al., 14 Sep 2025)	Anti-conformity + score-based decision	+13–16.5% accuracy, ½ tokens
ConfMAD (Lin et al., 17 Sep 2025)	Explicit confidence expression/calibration	+3–5% accuracy, ↑ consensus
ColMAD (Chen et al., 23 Oct 2025)	Non-zero-sum collaborative utility	+19% F1 over competitive MAD

By integrating these approaches, MUG provides a rigorous computational and theoretical basis for multi-agent inference, addressing scaling, robustness, and fairness for advanced LLM systems.