Papers
Topics
Authors
Recent
Search
2000 character limit reached

Preserving Disagreement: Architectural Heterogeneity and Coherence Validation in Multi-Agent Policy Simulation

Published 29 Apr 2026 in cs.MA and cs.AI | (2604.26561v1)

Abstract: Multi-agent deliberation systems using LLMs are increasingly proposed for policy simulation, yet they suffer from artificial consensus: evaluator agents converge on the same option regardless of their assigned value perspectives. We present the AI Council, a three-phase deliberation framework, and conduct 120 deliberations across two policy scenarios to test two interventions. First, architectural heterogeneity (assigning a different 7-9B parameter model to each value perspective) significantly reduces first-choice concentration compared to a homogeneous baseline (child welfare: 70.9% to 46.1%, p < 0.001, r = 0.58; housing: 46.0% to 22.9%, p < 0.001, r = 0.50). This contrasts with accuracy-oriented multi-agent debate, where heterogeneity does not reduce convergence, suggesting model diversity operates differently when no objectively correct answer exists. Second, coherence validation (using a frontier model to assess whether each evaluator's reasoning is grounded in its assigned values) reveals a fidelity-diversity tradeoff: on a scenario with a dominant option, it further reduces concentration (46.1% to 40.8%, p = 0.004), but on a scenario with genuinely competitive options, it increases concentration (22.9% to 26.6%, p = 0.96) by amplifying high-coherence evaluators who cluster on one option. This tradeoff may be a general property of multi-agent systems employing quality weighting. We report negative results from three failed Delphi designs, demonstrate that 8B models exhibit binary rather than graded responses to counter-arguments, and propose the trustworthy tension rate as a diagnostic measure of small-model deliberation capabilities.

Authors (1)

Summary

  • The paper demonstrates that architectural heterogeneity significantly decreases artificial consensus, fostering genuine dissent among LLM agents.
  • It introduces a coherence validation layer that assesses reasoning fidelity, balancing diversity and accuracy in normative deliberation.
  • Empirical results across policy scenarios underscore the importance of model diversity in capturing stakeholder value trade-offs effectively.

Preserving Disagreement: Architectural Heterogeneity and Coherence Validation in Multi-Agent Policy Simulation

Introduction

The paper "Preserving Disagreement: Architectural Heterogeneity and Coherence Validation in Multi-Agent Policy Simulation" (2604.26561) addresses the failure mode of artificial consensus in normative, multi-agent deliberation systems based on LLM agents. Unlike accuracy-oriented settings where answer convergence signals solution correctness, in the policy simulation and stakeholder modeling context, convergence among simulated agents undermines the surfacing of genuine value trade-offs, a critical aspect for decision support and participatory frameworks. This study introduces and empirically evaluates two core architectural interventions—model heterogeneity and coherence validation—within a system termed the "AI Council." The analysis traverses 120 deliberation runs across two structurally distinct policy scenarios, with robust experimental design and nuanced metrics.

Artificial Consensus in Multi-Agent LLM Deliberation

Artificial consensus describes the phenomenon whereby LLM-based agents, even when assigned distinct value perspectives, converge on policy recommendations as if consensus is intrinsic to the task. This is particularly pernicious in normative domains because such convergence suppresses legitimate dissent and clouds stakeholder-specific reasoning, making the system outputs less informative and less reflective of the conflicting values that shape real-world decisions. The study identifies two contributing mechanisms: (i) shared inductive bias due to the use of a single model across agents, and (ii) debate capture, wherein minority-perspective agents are persuaded by the majority's reasoning during structured debate, subverting intended diversity in viewpoints. These observations align with prior work on sycophancy and conformity in LLM-based social simulations.

Architectural Interventions

Heterogeneous Model Assignment

The primary intervention investigates architectural heterogeneity: each agent is instantiated with a different 7–9B parameter LLM, selected for maximal diversity across organization, training regime, and profiling alignment to assigned value perspectives. Contrasted with a homogeneous (single-model) baseline, heterogeneity significantly decreases first-choice concentration—i.e., the proportion of agents aligning on the same top option—in both policy scenarios (child welfare: 70.9%→46.1%, housing: 46.0%→22.9%, both p<0.001p<0.001, large effect sizes). This result is especially acute for normative tasks, whereas prior debate systems in accuracy-oriented literature found no such reduction in convergence from model diversity. Thus, the paper demonstrates that architectural heterogeneity disrupts shared inductive bias, surfacing more robust value disagreements when no objective ground truth exists.

Coherence Validation

The coherence validation layer introduces a post-hoc, value-fidelity assessment, using a single call to a frontier LLM (Claude Sonnet 4) to score the alignment of each agent's reasoning with its prescribed perspective. Scores serve as weights in a modified Borda count, downweighting incoherent reasoning rather than silencing disagreeable votes. The study dissects a fidelity–diversity tradeoff: when high-coherence evaluators cluster on the same option, coherence weighting amplifies convergence (reducing diversity), but when low-coherence majority votes are downweighted, disagreement is relatively preserved or even enhanced. The system’s robustness is validated with test-retest reliability and cross-model scoring, with high inter-rater correlations.

Experimental Design

A rigorous three-state experimental protocol is employed:

  • State A (Homogeneous): All agents use an identical model.
  • State B (Heterogeneous): Each agent uses a distinct, perspective-aligned model.
  • State C (Het.+Delphi): As State B, with additional coherence-weighted Borda counts.

Two policy scenarios are selected to probe the system under different structural conditions: one with a dominant option (child welfare) and one featuring genuine three-way competition (urban housing crisis). Key metrics include first-choice concentration, Borda margin, effective perspectives (Shannon entropy), voice authenticity, and trustworthy tension rates.

Empirical Findings

Effects of Heterogeneity

Across both scenarios, architectural heterogeneity robustly reduces first-choice concentration and increases entropy of voting distributions (i.e., more effective perspectives), with statistically large effects. This is direct evidence against the sufficiency of prompt engineering or single-model agent simulation for capturing stakeholder dissensus in normative deliberation tasks.

Coherence Validation and the Fidelity–Diversity Tradeoff

The effect of coherence validation is scenario-dependent. In the child welfare scenario—where one option is dominant—coherence validation further reduces concentration by marginalizing low-fidelity majority voices, thereby amplifying residual minority dissent. In contrast, for the housing scenario—where high-coherence perspectives happen to align—the same mechanism increases convergence, reducing surface-level diversity. Thus, the interplay between perspective–model fit and overall model competence structures the system’s emergent behavior, and suggests that any quality-weighted multi-agent system faces an inherent tradeoff between preserving perspective diversity and maximizing fidelity to assigned roles.

Model Capabilities and Deliberation Calibration

The system reveals that small models (e.g., 8B parameters) display binary rather than graded responses to counter-argument exposure: agents either maintain or capitulate, with no capacity for nuanced consideration or partial concession. This fortifies the design choice of using isolated, single-pass evaluations. Trustworthy tension rates—fractions of value-conflicted pairs demonstrating faithful reasoning—are approximately 50% across both scenarios, providing a quantitative baseline for small-model deliberation capabilities.

Theoretical and Practical Implications

This work establishes that, for multi-agent LLM simulations of normative domains, model architecture is a primary determinant of deliberative validity—inherently more so than prompt or instruction design. Consequently, heterogeneous pools should be considered a foundational design feature for policy analysis, stakeholder modeling, and participatory design systems. The results further nuance the limitations of coherence validation as a panacea: it is indispensable but must be interpreted in concert with (rather than as a replacement for) raw perspective diversity. Reporting both weighted and unweighted vote distributions is recommended for honest system audits.

On the theoretical side, the study generalizes the fidelity–diversity tradeoff: in the absence of a ground truth, reliability-weighted aggregation will naturally favor perspectives that are better modeled, but may suppress genuine minoritarian disagreements if those perspectives have poor model alignment.

Limitations and Directions for Future Work

The study’s insights are drawn from two scenarios and one 7-model pool; generalization requires larger, more diverse model sets and additional, structurally varied scenarios. Model–perspective fit is only partially controlled, with ongoing work needed to deconfound overall model quality from role-specific reasoning fidelity. Including full combinatorial model–role assignments and testing with larger (or open frontier) models are critical next steps. Further, the absence of a single-model, multi-perspective baseline on a frontier LLM leaves open whether similar dissensus can be induced purely via repeated sampling. Lastly, the current approach partially under-represents certain value perspectives due to tied trait parameters—a technical artifact to be remedied in future system versions.

Conclusion

This paper presents definitive evidence that architectural heterogeneity in multi-agent LLM systems is the primary mechanism for preserving deliberative disagreement in policy simulation contexts, and that simple value priming or prompt engineering is insufficient. Coherence validation as a post-hoc fidelity layer adds interpretability, but does not unconditionally preserve diversity, instead revealing a structural tradeoff that must be navigated deliberately. The AI Council framework sets a new baseline for rigorous, value-grounded multi-agent deliberation, with practical guidance for implementation and theoretical ramifications for the design of value-sensitive AI systems. This work establishes the need for multi-model architectures, fidelity–diversity diagnostics, and scenario-sensitive evaluation protocols in the continued development of LLM-based policy simulation tools.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.