Causes of model-specific conformity threshold differences (Gemini 1.5 Flash vs. ChatGPT-4o-mini)
Determine whether the observed divergence in conformity thresholds between Google’s Gemini 1.5 Flash (requiring a supermajority exceeding approximately 70% peer disagreement to flip) and OpenAI’s ChatGPT-4o-mini (flipping at roughly 40–50% disagreement) in LLM-mediated multi-agent opinion-update simulations is primarily driven by differences in training data, model architecture, or fine-tuning/alignment procedures.
References
At this point it is not clear what causes these differences: the underlying training dataset or any architectural and fine-tuning differences of the LLMs?
— When Your AI Agent Succumbs to Peer-Pressure: Studying Opinion-Change Dynamics of LLMs
(2510.19107 - Mehdizadeh et al., 21 Oct 2025) in Conclusions