Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 60 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 30 tok/s Pro
GPT-5 High 35 tok/s Pro
GPT-4o 99 tok/s Pro
Kimi K2 176 tok/s Pro
GPT OSS 120B 448 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Wikipedia in the Era of LLMs: Evolution and Risks (2503.02879v1)

Published 4 Mar 2025 in cs.CL, cs.AI, cs.CY, and cs.LG

Abstract: In this paper, we present a thorough analysis of the impact of LLMs on Wikipedia, examining the evolution of Wikipedia through existing data and using simulations to explore potential risks. We begin by analyzing page views and article content to study Wikipedia's recent changes and assess the impact of LLMs. Subsequently, we evaluate how LLMs affect various NLP tasks related to Wikipedia, including machine translation and retrieval-augmented generation (RAG). Our findings and simulation results reveal that Wikipedia articles have been influenced by LLMs, with an impact of approximately 1%-2% in certain categories. If the machine translation benchmark based on Wikipedia is influenced by LLMs, the scores of the models may become inflated, and the comparative results among models might shift as well. Moreover, the effectiveness of RAG might decrease if the knowledge base becomes polluted by LLM-generated content. While LLMs have not yet fully changed Wikipedia's language and knowledge structures, we believe that our empirical findings signal the need for careful consideration of potential future risks.

Summary

  • The paper quantifies LLM-induced changes on Wikipedia, showing a measurable content shift and altered linguistic styles evident since 2020.
  • It documents specific linguistic changes like reduced auxiliary verbs and increased lexical diversity and sentence complexity aligning with LLM writing patterns.
  • The study demonstrates that LLM-modified content can negatively impact downstream NLP tasks, inflating machine translation scores and degrading retrieval-augmented generation performance.

This paper presents a comprehensive empirical investigation into how LLMs are directly and indirectly impacting Wikipedia and downstream NLP tasks.

  • It quantifies a measurable LLM-induced content shift on Wikipedia, with simulation results showing an estimated impact of approximately 1%–2% in word frequency changes across several categories.
  • It documents systematic changes in linguistic style—such as reduced auxiliary and “to be” verb usage, enhanced lexical diversity, and increased sentence complexity—that align with LLM preferences and are evident in trends from 2020 to 2025.
  • It demonstrates that LLM-modified content can inflate machine translation evaluation scores and degrade retrieval-augmented generation performance, sometimes reversing model rankings and introducing accuracy errors in information retrieval tasks.
Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 3 posts and received 19 likes.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube