Beyond Alignment: Value Diversity as a Collective Property in Multicultural Agent Systems

Published 4 Jun 2026 in cs.CL and cs.CY | (2606.05985v1)

Abstract: Multicultural multi-agent systems are increasingly deployed in globally diverse settings, where different agents are grounded in different cultural backgrounds. Existing cultural evaluation focuses on value alignment: how closely a single agent matches a target culture. Yet alignment is a per-agent property and cannot reveal whether a system, taken as a whole, preserves the cultural plurality it is meant to represent. We propose value diversity as a system-level evaluation axis for multicultural agent systems, defined through the dissimilarity between culturally conditioned agents' responses on a shared value survey. Using the World Values Survey, we evaluate 19 cultures and 18 backbone models across a wide range of system configurations. We find that diversity is largely uncorrelated with alignment, indicating that the two capture complementary system properties, and that current multicultural agent systems fall substantially below human societies in value diversity. Mixed-backbone systems narrow this gap but do not close it, and the gap persists across culture compositions and agent scales. Social interaction further erodes diversity by driving agents toward consensus, and a participatory budgeting case study shows that this homogenization narrows the breadth of collective decision-making. Together, our results establish value diversity as a distinct evaluation axis for multicultural multi-agent systems and reveal a persistent homogenization tendency in current LLM-based societies. Our code and data are publicly available at https://github.com/iNLP-Lab/MultiAgent-Diversity.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper establishes value diversity as a collective property by introducing two novel metrics—Pairwise and Structural Diversity—with empirical validations.
The quantitative analysis across 19 cultures and 18 backbones reveals that high per-agent alignment does not ensure overall system plurality, indicating an orthogonal relationship.
The study shows that while mixed-backbone configurations and social interactions modestly improve diversity, persistent homogenization limits authentic cultural representation.

System-Level Value Diversity in Multicultural Agent Systems

Motivation and Conceptual Framework

The proliferation of multicultural multi-agent systems (MAS) built on LLMs requires nuanced measures of collective behavior beyond per-agent evaluation. Historically, research has focused on value alignment—quantifying how closely each agent's responses mirror a target culture, typically through social-value surveys such as the World Values Survey (WVS) [align1, align2, align3, align4, align5, align6]. However, this approach overlooks whether a system embedding multiple culturally conditioned agents maintains authentic cultural plurality. Addressing this critical gap, the paper establishes value diversity as a distinct collective property of multicultural agent societies and proposes system-level metrics to quantify it.

Value diversity is operationalized as the intra-system dissimilarity between culturally conditioned agents' responses on shared value surveys. Crucially, alignment and diversity are shown to be largely orthogonal: a system can exhibit high fidelity to individual cultures while collapsing toward inter-agent homogeneity, undermining pluralistic representation.

Quantitative Framework and Metrics

The authors introduce two metrics for system-level value diversity. Pairwise Diversity ( $\mathrm{Diversity}_P$ ) averages the normalized Euclidean distances between agent response vectors across all pairs. Structural Diversity ( $\mathrm{Diversity}_S$ ) computes the mean distance along the minimum spanning tree (MST) connecting all agents in the response space, discounting redundant geometric relations.

Alignment follows prior art, evaluating each agent's similarity to its cultural majority-vote vector from WVS. System alignment averages agent-level alignment scores.

Empirical Landscape: Diversity versus Alignment

The empirical assessment spans 19 cultures and 18 backbone LLMs across millions of configurations. In static MAS where all agents share a backbone, no LLM system achieves human-level value diversity. Systems such as gemini-2.5-pro attain maximum $\mathrm{Diversity}_P = 36.12$ (human: $44.07$), and $\mathrm{Diversity}_S = 29.60$ (human: $39.37$). Backbone strength (i.e., model size or recency) does not consistently correlate with diversity scores.

The diversity-alignment landscape reveals negligible correlation ( $r = -0.12$ ), indicating that per-agent cultural fidelity does not reflect system-level plurality (Figure 1).

Figure 1: Landscape of system-level value diversity and value alignment for single-backbone MAS; left: joint distribution colored by model family, right: per-question $(D_q, A_q)$ distributions for contrasting systems.

Case analysis shows that high alignment systems can be highly homogeneous (e.g., grok-3), while high diversity configurations may lack optimal cultural fidelity (e.g., gemini-2.5-pro). This demonstrates that diversity exposes intra-system properties unobservable through alignment.

Mixed-Backbone Systems and Configuration Space

Real-world deployments often involve MAS participants operating on different backbones. Exhaustive evaluation over $18^5 \approx 1.89$ M mixed-backbone assignments reveals that mixed-backbone Pareto frontiers strictly dominate those of single-backbone systems in both diversity and alignment (Figure 2). Mixed architectures yield measurable increases: $\Delta D = +1.65$ in diversity and $\mathrm{Diversity}_S$ 0 in alignment relative to corresponding single-backbone references.

Figure 2: Diversity–alignment landscape for all $\mathrm{Diversity}_S$ 1 backbone configurations ( $\mathrm{Diversity}_S$ 2); mixed-backbone systems dominate single-backbone frontiers.

Despite these gains, the diversity gap with human societies persists across scale and composition.

Cultural Composition, Agent Count, and Scaling Effects

Analysis of cultural subset selection demonstrates that system-level value diversity is only modestly sensitive to which cultures are represented (Figure 3). Even optimal five-culture subsets deliver diversity scores well below human reference values. Scaling agent count amplifies the gap: as more agents are added, systems exhibit increasingly pronounced homogenization relative to human populations.

Figure 3: Effects of culture selection and agent count on system-level value diversity; (a) diversity sorted across all five-culture subsets, (b) system–human diversity gap by agent count.

The authors emulate dynamic MAS by implementing multi-round social exposure. Contrary to Social Identity Theory, interaction consistently reduces system value diversity (average $\mathrm{Diversity}_S$ 3) while producing only marginal gains in per-agent alignment (Figure 4). Repeated exposure over five rounds further entrenches homogenization: diversity never recovers to static pre-interaction levels (Figure 5).

Figure 4: Effect of one-round social exposure—diversity decreases for all systems, alignment increase marginal.

Figure 5: System diversity over five rounds of interaction for six representative backbones; no recovery toward initial diversity.

System alignment fluctuates minimally under repeated exposure, underscoring the need for system-level diversity metrics to detect emergent consensus phenomena (see Appendix Figure 6).

Collective Decision-Making: Participatory Budgeting Case Study

The practical consequences of value diversity manifest in collective societal prioritization behaviors. In a participatory budgeting scenario, agents sample projects corresponding to WVS value dimensions. High-diversity systems allocate approvals across broader societal domains, while low-diversity systems converge on narrow priorities, inducing reduced plurality in resource allocation (Figures 6, 9, 10).

Figure 7: Collective decision-making outcomes with claude-opus-4.7 backbone; high-diversity system covers more societal dimensions.

Figure 8: Collective decision-making outcomes with gpt-5.4 backbone; broader coverage for high-diversity system.

Figure 9: Collective decision-making outcomes with gemini-3.1-flash-lite-preview backbone; broader coverage for high-diversity system.

This establishes value diversity not only as a theoretical property but as a determinant of downstream MAS behavior in democratic contexts.

Implications and Theoretical Outlook

The research delineates value diversity as a foundational axis for evaluating multicultural MAS. Homogenization is pervasive in LLM-based societies irrespective of backbone, cultural composition, scaling, or interaction. Mixed-backbone systems modestly mitigate but do not eliminate this effect.

Key theoretical implications include:

System-level evaluation: Divergence from human societies in diversity warrants pluralistic alignment frameworks and foregrounds the limitations of agent-level assessment.
Dynamic consensus dynamics: The tendency toward inter-agent homogenization is structurally entrenched; interaction exacerbates consensus drift and collapses plurality.
Cultural representation limits: Current LLMs, despite advancing individual cultural alignment, fail to encode authentic intra-system heterogeneity.

On the practical frontier, system-level diversity influences collective reasoning, social simulation fidelity, and societal prioritization in MAS-driven platforms, with direct implications for agent-native social networks such as MoltBook [molt5, molt4, molt3, molt2, molt1].

Future lines of inquiry should focus on:

Developing backbone training and aggregation paradigms that preserve system plurality.
Extending diversity measures to dialogue, norm reasoning, and emergent social behaviors.
Evaluating pluralistic alignment through social choice theory and novel aggregation methods [plural1, plural2, plural3].

Conclusion

This study establishes value diversity as a critical, currently unmet evaluation axis for multicultural LLM-based agent societies. Diversity and alignment measure complementary system properties; alignment alone is insufficient to guarantee authentic cultural plurality. The persistent homogenization effect, only partially alleviated by mixed-backbone systems, is exacerbated by dynamic social interaction. Diversity deficits have direct behavioral consequences in collective decision-making. These findings substantiate the necessity for system-level evaluation strategies and invite new directions in MAS theory and LLM training aimed at pluralistic societal representation (2606.05985).

Markdown Report Issue