- The paper establishes value diversity as a collective property by introducing two novel metricsโPairwise and Structural Diversityโwith empirical validations.
- The quantitative analysis across 19 cultures and 18 backbones reveals that high per-agent alignment does not ensure overall system plurality, indicating an orthogonal relationship.
- The study shows that while mixed-backbone configurations and social interactions modestly improve diversity, persistent homogenization limits authentic cultural representation.
System-Level Value Diversity in Multicultural Agent Systems
Motivation and Conceptual Framework
The proliferation of multicultural multi-agent systems (MAS) built on LLMs requires nuanced measures of collective behavior beyond per-agent evaluation. Historically, research has focused on value alignmentโquantifying how closely each agent's responses mirror a target culture, typically through social-value surveys such as the World Values Survey (WVS) [align1, align2, align3, align4, align5, align6]. However, this approach overlooks whether a system embedding multiple culturally conditioned agents maintains authentic cultural plurality. Addressing this critical gap, the paper establishes value diversity as a distinct collective property of multicultural agent societies and proposes system-level metrics to quantify it.
Value diversity is operationalized as the intra-system dissimilarity between culturally conditioned agents' responses on shared value surveys. Crucially, alignment and diversity are shown to be largely orthogonal: a system can exhibit high fidelity to individual cultures while collapsing toward inter-agent homogeneity, undermining pluralistic representation.
Quantitative Framework and Metrics
The authors introduce two metrics for system-level value diversity. Pairwise Diversity (DiversityPโ) averages the normalized Euclidean distances between agent response vectors across all pairs. Structural Diversity (DiversitySโ) computes the mean distance along the minimum spanning tree (MST) connecting all agents in the response space, discounting redundant geometric relations.
Alignment follows prior art, evaluating each agent's similarity to its cultural majority-vote vector from WVS. System alignment averages agent-level alignment scores.
Empirical Landscape: Diversity versus Alignment
The empirical assessment spans 19 cultures and 18 backbone LLMs across millions of configurations. In static MAS where all agents share a backbone, no LLM system achieves human-level value diversity. Systems such as gemini-2.5-pro attain maximum DiversityPโ=36.12 (human: $44.07$), and DiversitySโ=29.60 (human: $39.37$). Backbone strength (i.e., model size or recency) does not consistently correlate with diversity scores.
The diversity-alignment landscape reveals negligible correlation (r=โ0.12), indicating that per-agent cultural fidelity does not reflect system-level plurality (Figure 1).
Figure 1: Landscape of system-level value diversity and value alignment for single-backbone MAS; left: joint distribution colored by model family, right: per-question (Dqโ,Aqโ) distributions for contrasting systems.
Case analysis shows that high alignment systems can be highly homogeneous (e.g., grok-3), while high diversity configurations may lack optimal cultural fidelity (e.g., gemini-2.5-pro). This demonstrates that diversity exposes intra-system properties unobservable through alignment.
Mixed-Backbone Systems and Configuration Space
Real-world deployments often involve MAS participants operating on different backbones. Exhaustive evaluation over 185โ1.89M mixed-backbone assignments reveals that mixed-backbone Pareto frontiers strictly dominate those of single-backbone systems in both diversity and alignment (Figure 2). Mixed architectures yield measurable increases: ฮD=+1.65 in diversity and DiversitySโ0 in alignment relative to corresponding single-backbone references.
Figure 2: Diversityโalignment landscape for all DiversitySโ1 backbone configurations (DiversitySโ2); mixed-backbone systems dominate single-backbone frontiers.
Despite these gains, the diversity gap with human societies persists across scale and composition.
Cultural Composition, Agent Count, and Scaling Effects
Analysis of cultural subset selection demonstrates that system-level value diversity is only modestly sensitive to which cultures are represented (Figure 3). Even optimal five-culture subsets deliver diversity scores well below human reference values. Scaling agent count amplifies the gap: as more agents are added, systems exhibit increasingly pronounced homogenization relative to human populations.
Figure 3: Effects of culture selection and agent count on system-level value diversity; (a) diversity sorted across all five-culture subsets, (b) systemโhuman diversity gap by agent count.
Dynamic Interaction and Social Exposure
The authors emulate dynamic MAS by implementing multi-round social exposure. Contrary to Social Identity Theory, interaction consistently reduces system value diversity (average DiversitySโ3) while producing only marginal gains in per-agent alignment (Figure 4). Repeated exposure over five rounds further entrenches homogenization: diversity never recovers to static pre-interaction levels (Figure 5).
Figure 4: Effect of one-round social exposureโdiversity decreases for all systems, alignment increase marginal.
Figure 5: System diversity over five rounds of interaction for six representative backbones; no recovery toward initial diversity.
System alignment fluctuates minimally under repeated exposure, underscoring the need for system-level diversity metrics to detect emergent consensus phenomena (see Appendix Figure 6).
Collective Decision-Making: Participatory Budgeting Case Study
The practical consequences of value diversity manifest in collective societal prioritization behaviors. In a participatory budgeting scenario, agents sample projects corresponding to WVS value dimensions. High-diversity systems allocate approvals across broader societal domains, while low-diversity systems converge on narrow priorities, inducing reduced plurality in resource allocation (Figures 6, 9, 10).
Figure 7: Collective decision-making outcomes with claude-opus-4.7 backbone; high-diversity system covers more societal dimensions.
Figure 8: Collective decision-making outcomes with gpt-5.4 backbone; broader coverage for high-diversity system.
Figure 9: Collective decision-making outcomes with gemini-3.1-flash-lite-preview backbone; broader coverage for high-diversity system.
This establishes value diversity not only as a theoretical property but as a determinant of downstream MAS behavior in democratic contexts.
Implications and Theoretical Outlook
The research delineates value diversity as a foundational axis for evaluating multicultural MAS. Homogenization is pervasive in LLM-based societies irrespective of backbone, cultural composition, scaling, or interaction. Mixed-backbone systems modestly mitigate but do not eliminate this effect.
Key theoretical implications include:
- System-level evaluation: Divergence from human societies in diversity warrants pluralistic alignment frameworks and foregrounds the limitations of agent-level assessment.
- Dynamic consensus dynamics: The tendency toward inter-agent homogenization is structurally entrenched; interaction exacerbates consensus drift and collapses plurality.
- Cultural representation limits: Current LLMs, despite advancing individual cultural alignment, fail to encode authentic intra-system heterogeneity.
On the practical frontier, system-level diversity influences collective reasoning, social simulation fidelity, and societal prioritization in MAS-driven platforms, with direct implications for agent-native social networks such as MoltBook [molt5, molt4, molt3, molt2, molt1].
Future lines of inquiry should focus on:
- Developing backbone training and aggregation paradigms that preserve system plurality.
- Extending diversity measures to dialogue, norm reasoning, and emergent social behaviors.
- Evaluating pluralistic alignment through social choice theory and novel aggregation methods [plural1, plural2, plural3].
Conclusion
This study establishes value diversity as a critical, currently unmet evaluation axis for multicultural LLM-based agent societies. Diversity and alignment measure complementary system properties; alignment alone is insufficient to guarantee authentic cultural plurality. The persistent homogenization effect, only partially alleviated by mixed-backbone systems, is exacerbated by dynamic social interaction. Diversity deficits have direct behavioral consequences in collective decision-making. These findings substantiate the necessity for system-level evaluation strategies and invite new directions in MAS theory and LLM training aimed at pluralistic societal representation (2606.05985).