Papers
Topics
Authors
Recent
2000 character limit reached

LLM as a Broken Telephone: Iterative Generation Distorts Information (2502.20258v1)

Published 27 Feb 2025 in cs.CL and cs.AI

Abstract: As LLMs are increasingly responsible for online content, concerns arise about the impact of repeatedly processing their own outputs. Inspired by the "broken telephone" effect in chained human communication, this study investigates whether LLMs similarly distort information through iterative generation. Through translation-based experiments, we find that distortion accumulates over time, influenced by language choice and chain complexity. While degradation is inevitable, it can be mitigated through strategic prompting techniques. These findings contribute to discussions on the long-term effects of AI-mediated information propagation, raising important questions about the reliability of LLM-generated content in iterative workflows.

Summary

  • The paper examines cumulative information distortion in iterative LLM generation through controlled translation and rephrasing experiments.
  • Iterated generation chains result in measurable declines in relevance and factuality, quantified by metrics like BLEU and FActScore, with greater degradation observed for non-Latin scripts.
  • Chain complexity, bridge languages, decoding temperature, and constrained prompting critically influence distortion rates, suggesting mitigation through careful configuration.

This paper examines cumulative information distortion in iterative LLM generation through controlled machine translation and rephrasing experiments.

  • Iterated translation chains employing diverse intermediate languages and model configurations result in measurable declines in both textual relevance and factuality, quantified by metrics such as BLEU, ROUGE, CHR-F, METEOR, BERTScore, and FActScore with gradients reaching as low as –0.040 (±0.025) for non-Latin scripts.
  • Experimental setups—including bilingual self-loop, bilingual two-player, and multilingual multiplayer chains—demonstrate that chain complexity and the linguistic representativeness of bridge languages critically influence degradation rates.
  • Ablation studies reveal that lowering decoding temperatures and enforcing constrained prompting can mitigate cumulative distortion, while collaborative multi-model chains exhibit variable fidelity depending on individual model strengths and training corpus composition.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 7 likes about this paper.