Determine the critical consensus size for GPT-4 Turbo LLM societies

Determine the critical consensus size N_c for GPT-4 Turbo-based agent societies under the paper’s voter-like opinion dynamics framework, defined as the minimal group size N at which the estimated majority force β(N) falls below the Curie–Weiss critical value β = 1 (i.e., beyond which consensus becomes exponentially unlikely). Establish whether such an N_c exists within practical scales and, if so, identify or tightly bound its value, given that simulations up to N = 1000 did not yield β(N) < 1 or failure to reach consensus.

Background

The paper models consensus formation in groups of LLM agents via a binary opinion update process where each agent observes all other agents’ opinions and decides its next opinion. The adoption probability is well fit by P(m) = 0.5 (tanh(β m) + 1), enabling a mapping to the Curie–Weiss model with majority force parameter β. In this mapping, β = 1 marks a critical point separating disordered (no consensus) from ordered (consensus) regimes.

The authors define a critical consensus size N_c for each model as the smallest group size N at which β(N) reaches the Curie–Weiss critical value β = 1, beyond which consensus becomes exponentially unlikely. For GPT-4 Turbo, their experiments did not find β(N) < 1 nor a failure to reach consensus up to N = 1000, implying only a lower bound on N_c and leaving the exact value unresolved.

References

For humans we report the Dunbar's number, while for GPT-4 Turbo we can only report a lower bound, since we were unable to find a group size that did not reach consensus with this model.

AI agents can coordinate beyond human scale  (2409.02822 - Marzo et al., 2024) in Figure 3 caption, Section "Critical Consensus Size"