Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 71 tok/s

Gemini 2.5 Pro 52 tok/s Pro

GPT-5 Medium 18 tok/s Pro

GPT-5 High 15 tok/s Pro

GPT-4o 101 tok/s Pro

Kimi K2 196 tok/s Pro

GPT OSS 120B 467 tok/s Pro

Claude Sonnet 4 37 tok/s Pro

2000 character limit reached

The Lock-in Hypothesis: Stagnation by Algorithm (2506.06166v1)

Published 6 Jun 2025 in cs.LG, cs.AI, cs.CL, cs.CY, and cs.HC

Abstract: The training and deployment of LLMs create a feedback loop with human users: models learn human beliefs from data, reinforce these beliefs with generated content, reabsorb the reinforced beliefs, and feed them back to users again and again. This dynamic resembles an echo chamber. We hypothesize that this feedback loop entrenches the existing values and beliefs of users, leading to a loss of diversity and potentially the lock-in of false beliefs. We formalize this hypothesis and test it empirically with agent-based LLM simulations and real-world GPT usage data. Analysis reveals sudden but sustained drops in diversity after the release of new GPT iterations, consistent with the hypothesized human-AI feedback loop. Code and data available at https://thelockinhypothesis.com

Collections

Summary

The paper demonstrates that a Bayesian trust-based model predicts inevitable lock-in when the spectral radius of the trust network exceeds unity.
Agent-based GPT simulations show that repeated human-LLM interactions lead to a significant drop in conceptual diversity over time.
Empirical analysis using regression techniques confirms that iterative LLM training correlates with abrupt decreases in belief diversity.

The paper "The Lock-in Hypothesis: Stagnation by Algorithm" (2506.06166) investigates the potential negative consequences of feedback loops between LLMs and human users. The authors propose that this dynamic interaction, where models learn from human data, influence human opinions through their output, and then reabsorb those influenced beliefs, can lead to a state of "lock-in."

The Lock-in Hypothesis, as formalized in the paper, states that this feedback loop will eventually cause a population to converge on specific beliefs, potentially false ones. Once formed, these beliefs become resistant to change, amplified by feedback loops and human trust in the AI.

To formalize this, the authors develop a Bayesian model involving a group of $N$ agents (which can represent humans or AI) estimating an unknown quantity. Each agent maintains a private belief and an aggregate belief informed by their own observations and the beliefs of other agents they trust. The interactions are modeled by a trust matrix $\mathbf{W}$ , where $w_{i,j}$ indicates the degree to which agent $i$ trusts agent $j$ 's belief. The paper shows that if the spectral radius of the trust matrix $\rho(\mathbf{W})$ is greater than 1, collective lock-in to a false belief is inevitable. This condition implies that the feedback loop is self-amplifying due to sufficient mutual trust. They apply this to a specific human-LLM dynamic with one AI agent and $N-1$ human agents, showing lock-in occurs if $(N-1)\lambda_1\lambda_2 > 1$ , where $\lambda_1$ is the AI's trust in humans (preference learning strength) and $\lambda_2$ is humans' trust in the AI. This condition suggests that even moderate mutual trust can lead to lock-in in a sufficiently large group.

The paper complements this theoretical model with agent-based LLM simulations. Using GPT-4.1-Nano (for agents and authority), Setup C simulates 100 agents holding natural language beliefs on a given topic (from r/ChangeMyView) and interacting with a centralized LLM authority. The authority aggregates group beliefs and provides a summarized belief, which agents then use to update their own, based on a pre-assigned trust level. The simulations demonstrate that as interactions progress, agents' beliefs converge, leading to a significant drop in conceptual diversity. This "belief shift" can result in convergence on extreme views or, in some cases, hedged stances, depending on the topic and LLM behavior. The simulations support the idea that diversity loss is an observable metric of lock-in.

Empirical evidence is sought from the WildChat-1M dataset (Zhao et al., 2 May 2024), which contains logs of human interactions with a ChatGPT mirror site. The authors analyze the conceptual diversity in human messages over time, using a constructed concept hierarchy and a novel "lineage diversity" metric that accounts for hierarchical structure. They test two main hypotheses:

Collective Diversity Loss: Conceptual diversity in the corpus of human messages decreases over time.
Iterative Training Leads to Loss: Diversity decreases discontinuously when new GPT iterations (trained on new human data) are deployed.

Results show ambiguous support for Hypothesis 1, with a downward trend in diversity for GPT-4 interactions but an upward trend for GPT-3.5-turbo interactions on value-laden concepts. However, Hypothesis 2 receives stronger support. Using a regression kink design (RKD), the authors detect significant discontinuous downward shifts in conceptual diversity following the release dates of new GPT versions (GPT-4-0125-preview, GPT-3.5-turbo-0613, GPT-3.5-turbo-0125). Per-user regression analysis on high-engagement users also tentatively supports Hypothesis 2, suggesting the impact is sustained and not merely a temporary phenomenon at release dates.

The authors acknowledge limitations, including potential confounding factors in the observational WildChat data and the simplified nature of their simulations compared to real-world interactions. They propose future work, including randomized controlled trials (RCTs) with human subjects, more realistic simulations incorporating external evidence and diverse sources, and the development of systematic evaluation and mitigation strategies for lock-in effects.

The paper concludes that the findings provide early-stage evidence supporting the lock-in hypothesis, particularly concerning the impact of iterative model training on collective conceptual diversity. It highlights the importance of further research into the dynamics of human-LLM interaction and the potential need for technical, algorithmic, or policy interventions to mitigate adverse consequences.