- The paper demonstrates that architectural heterogeneity significantly decreases artificial consensus, fostering genuine dissent among LLM agents.
- It introduces a coherence validation layer that assesses reasoning fidelity, balancing diversity and accuracy in normative deliberation.
- Empirical results across policy scenarios underscore the importance of model diversity in capturing stakeholder value trade-offs effectively.
Preserving Disagreement: Architectural Heterogeneity and Coherence Validation in Multi-Agent Policy Simulation
Introduction
The paper "Preserving Disagreement: Architectural Heterogeneity and Coherence Validation in Multi-Agent Policy Simulation" (2604.26561) addresses the failure mode of artificial consensus in normative, multi-agent deliberation systems based on LLM agents. Unlike accuracy-oriented settings where answer convergence signals solution correctness, in the policy simulation and stakeholder modeling context, convergence among simulated agents undermines the surfacing of genuine value trade-offs, a critical aspect for decision support and participatory frameworks. This study introduces and empirically evaluates two core architectural interventions—model heterogeneity and coherence validation—within a system termed the "AI Council." The analysis traverses 120 deliberation runs across two structurally distinct policy scenarios, with robust experimental design and nuanced metrics.
Artificial Consensus in Multi-Agent LLM Deliberation
Artificial consensus describes the phenomenon whereby LLM-based agents, even when assigned distinct value perspectives, converge on policy recommendations as if consensus is intrinsic to the task. This is particularly pernicious in normative domains because such convergence suppresses legitimate dissent and clouds stakeholder-specific reasoning, making the system outputs less informative and less reflective of the conflicting values that shape real-world decisions. The study identifies two contributing mechanisms: (i) shared inductive bias due to the use of a single model across agents, and (ii) debate capture, wherein minority-perspective agents are persuaded by the majority's reasoning during structured debate, subverting intended diversity in viewpoints. These observations align with prior work on sycophancy and conformity in LLM-based social simulations.
Architectural Interventions
Heterogeneous Model Assignment
The primary intervention investigates architectural heterogeneity: each agent is instantiated with a different 7–9B parameter LLM, selected for maximal diversity across organization, training regime, and profiling alignment to assigned value perspectives. Contrasted with a homogeneous (single-model) baseline, heterogeneity significantly decreases first-choice concentration—i.e., the proportion of agents aligning on the same top option—in both policy scenarios (child welfare: 70.9%→46.1%, housing: 46.0%→22.9%, both p<0.001, large effect sizes). This result is especially acute for normative tasks, whereas prior debate systems in accuracy-oriented literature found no such reduction in convergence from model diversity. Thus, the paper demonstrates that architectural heterogeneity disrupts shared inductive bias, surfacing more robust value disagreements when no objective ground truth exists.
Coherence Validation
The coherence validation layer introduces a post-hoc, value-fidelity assessment, using a single call to a frontier LLM (Claude Sonnet 4) to score the alignment of each agent's reasoning with its prescribed perspective. Scores serve as weights in a modified Borda count, downweighting incoherent reasoning rather than silencing disagreeable votes. The study dissects a fidelity–diversity tradeoff: when high-coherence evaluators cluster on the same option, coherence weighting amplifies convergence (reducing diversity), but when low-coherence majority votes are downweighted, disagreement is relatively preserved or even enhanced. The system’s robustness is validated with test-retest reliability and cross-model scoring, with high inter-rater correlations.
Experimental Design
A rigorous three-state experimental protocol is employed:
- State A (Homogeneous): All agents use an identical model.
- State B (Heterogeneous): Each agent uses a distinct, perspective-aligned model.
- State C (Het.+Delphi): As State B, with additional coherence-weighted Borda counts.
Two policy scenarios are selected to probe the system under different structural conditions: one with a dominant option (child welfare) and one featuring genuine three-way competition (urban housing crisis). Key metrics include first-choice concentration, Borda margin, effective perspectives (Shannon entropy), voice authenticity, and trustworthy tension rates.
Empirical Findings
Effects of Heterogeneity
Across both scenarios, architectural heterogeneity robustly reduces first-choice concentration and increases entropy of voting distributions (i.e., more effective perspectives), with statistically large effects. This is direct evidence against the sufficiency of prompt engineering or single-model agent simulation for capturing stakeholder dissensus in normative deliberation tasks.
Coherence Validation and the Fidelity–Diversity Tradeoff
The effect of coherence validation is scenario-dependent. In the child welfare scenario—where one option is dominant—coherence validation further reduces concentration by marginalizing low-fidelity majority voices, thereby amplifying residual minority dissent. In contrast, for the housing scenario—where high-coherence perspectives happen to align—the same mechanism increases convergence, reducing surface-level diversity. Thus, the interplay between perspective–model fit and overall model competence structures the system’s emergent behavior, and suggests that any quality-weighted multi-agent system faces an inherent tradeoff between preserving perspective diversity and maximizing fidelity to assigned roles.
Model Capabilities and Deliberation Calibration
The system reveals that small models (e.g., 8B parameters) display binary rather than graded responses to counter-argument exposure: agents either maintain or capitulate, with no capacity for nuanced consideration or partial concession. This fortifies the design choice of using isolated, single-pass evaluations. Trustworthy tension rates—fractions of value-conflicted pairs demonstrating faithful reasoning—are approximately 50% across both scenarios, providing a quantitative baseline for small-model deliberation capabilities.
Theoretical and Practical Implications
This work establishes that, for multi-agent LLM simulations of normative domains, model architecture is a primary determinant of deliberative validity—inherently more so than prompt or instruction design. Consequently, heterogeneous pools should be considered a foundational design feature for policy analysis, stakeholder modeling, and participatory design systems. The results further nuance the limitations of coherence validation as a panacea: it is indispensable but must be interpreted in concert with (rather than as a replacement for) raw perspective diversity. Reporting both weighted and unweighted vote distributions is recommended for honest system audits.
On the theoretical side, the study generalizes the fidelity–diversity tradeoff: in the absence of a ground truth, reliability-weighted aggregation will naturally favor perspectives that are better modeled, but may suppress genuine minoritarian disagreements if those perspectives have poor model alignment.
Limitations and Directions for Future Work
The study’s insights are drawn from two scenarios and one 7-model pool; generalization requires larger, more diverse model sets and additional, structurally varied scenarios. Model–perspective fit is only partially controlled, with ongoing work needed to deconfound overall model quality from role-specific reasoning fidelity. Including full combinatorial model–role assignments and testing with larger (or open frontier) models are critical next steps. Further, the absence of a single-model, multi-perspective baseline on a frontier LLM leaves open whether similar dissensus can be induced purely via repeated sampling. Lastly, the current approach partially under-represents certain value perspectives due to tied trait parameters—a technical artifact to be remedied in future system versions.
Conclusion
This paper presents definitive evidence that architectural heterogeneity in multi-agent LLM systems is the primary mechanism for preserving deliberative disagreement in policy simulation contexts, and that simple value priming or prompt engineering is insufficient. Coherence validation as a post-hoc fidelity layer adds interpretability, but does not unconditionally preserve diversity, instead revealing a structural tradeoff that must be navigated deliberately. The AI Council framework sets a new baseline for rigorous, value-grounded multi-agent deliberation, with practical guidance for implementation and theoretical ramifications for the design of value-sensitive AI systems. This work establishes the need for multi-model architectures, fidelity–diversity diagnostics, and scenario-sensitive evaluation protocols in the continued development of LLM-based policy simulation tools.