Causes of model-specific conformity threshold differences (Gemini 1.5 Flash vs. ChatGPT-4o-mini)

Determine whether the observed divergence in conformity thresholds between Google’s Gemini 1.5 Flash (requiring a supermajority exceeding approximately 70% peer disagreement to flip) and OpenAI’s ChatGPT-4o-mini (flipping at roughly 40–50% disagreement) in LLM-mediated multi-agent opinion-update simulations is primarily driven by differences in training data, model architecture, or fine-tuning/alignment procedures.

Background

The paper audits socio-cognitive behaviors of LLM agents embedded in social networks where agents update binary opinions based on peer distributions via natural-language prompts. It finds a robust persuasion asymmetry and a dual cognitive hierarchy across topics and framings.

A targeted replication showed that, compared to Gemini 1.5 Flash (high resistance; >70% dissent needed), ChatGPT-4o-mini exhibited lower conformity thresholds (~40–50%), suggesting stronger minority influence. The authors explicitly state it is not clear whether these differences are due to training data, architecture, or fine-tuning, raising a concrete, model-comparison open question.

References

At this point it is not clear what causes these differences: the underlying training dataset or any architectural and fine-tuning differences of the LLMs?

— When Your AI Agent Succumbs to Peer-Pressure: Studying Opinion-Change Dynamics of LLMs (2510.19107 - Mehdizadeh et al., 21 Oct 2025) in Conclusions

Causes of model-specific conformity threshold differences (Gemini 1.5 Flash vs. ChatGPT-4o-mini)

Background

References

Related Problems