Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploring the Impact of ChatGPT on Wikipedia Engagement (2405.10205v3)

Published 16 May 2024 in cs.HC

Abstract: Wikipedia is one of the most popular websites in the world, serving as a major source of information and learning resource for millions of users worldwide. While motivations for its usage vary, prior research suggests shallow information gathering -- looking up facts and information or answering questions -- dominates over more in-depth usage. On the 22nd of November 2022, ChatGPT was released to the public and has quickly become a popular source of information, serving as an effective question-answering and knowledge gathering resource. Early indications have suggested that it may be drawing users away from traditional question answering services such as Stack Overflow, raising the question of how it may have impacted Wikipedia. In this paper, we explore Wikipedia user metrics across four areas: page views, unique visitor numbers, edit counts and editor numbers within twelve language instances of Wikipedia. We perform pairwise comparisons of these metrics before and after the release of ChatGPT and implement a panel regression model to observe and quantify longer-term trends. We find no evidence of a fall in engagement across any of the four metrics, instead observing that page views and visitor numbers increased in the period following ChatGPT's launch. However, we observe a lower increase in languages where ChatGPT was available than in languages where it was not, which may suggest ChatGPT's availability limited growth in those languages. Our results contribute to the understanding of how emerging generative AI tools are disrupting the Web ecosystem.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Ai unreliable answers: A case study on chatgpt. In International Conference on Human-Computer Interaction (pp. 23–40).
  2. There and here: patterns of content transclusion in wikipedia. In Proceedings of the 28th ACM Conference on Hypertext and Social Media (pp. 115–124).
  3. Contextual documentation referencing on stack overflow. IEEE Transactions on Software Engineering, 48(1), 135–149.
  4. A multitask, multilingual, multimodal evaluation of ChatGPT on reasoning, hallucination, and interactivity. In Park, J. C., Arase, Y., Hu, B., Lu, W., Wijaya, D., Purwarianti, A. & Krisnadhi, A. A. (Eds.), Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 675–718). Nusa Dua, Bali: Association for Computational Linguistics.
  5. Language models are few-shot learners. Advances in neural information processing systems, 33, 1877–1901.
  6. The consequences of generative ai for ugc and online community engagement. Available at SSRN 4521754.
  7. Generative ai degrades online communities. Communications of the ACM, 67(3), 40–42.
  8. A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology.
  9. Detecting and gauging impact on wikipedia page views. In Companion Proceedings of The 2019 World Wide Web Conference (pp. 1254–1261).
  10. Transforming sentiment analysis in the financial domain with chatgpt. Machine Learning with Applications, 14, 100508.
  11. Mathematical capabilities of chatgpt. Advances in Neural Information Processing Systems, 36.
  12. Geiger, R. S. (2017). Beyond opening up the black box: Investigating the role of algorithmic systems in wikipedian organizational culture. Big Data & Society, 4(2), 2053951717730735.
  13. How does chatgpt perform on the united states medical licensing examination? the implications of large language models for medical education and knowledge assessment. JMIR Medical Education, 9(1), e45312.
  14. Exploring the potential of chatgpt in automated code refinement: An empirical study. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering (pp. 1–13).
  15. Ores: Lowering barriers with participatory machine learning in wikipedia. Proceedings of the ACM on Human-Computer Interaction, 4(CSCW2), 1–37.
  16. The_tower_of_babel. jpg: diversity of visual encyclopedic knowledge across wikipedia language editions. In Proceedings of the International AAAI Conference on Web and Social Media, Volume 12.
  17. Understanding wikipedia practices through hindi, urdu, and english takes on an evolving regional conflict. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW1), 1–31.
  18. Health on wikipedia: a qualitative study of the attitudes, perceptions, and use of wikipedia as a source of health information by middle-aged and older adults. Information, Communication & Society, 24(12), 1797–1813.
  19. Verifying social network models of wikipedia knowledge community. Information Sciences, 339, 158–174.
  20. Global gender differences in wikipedia readership. In Proceedings of the International AAAI Conference on Web and Social Media, Volume 15 (pp. 254–265).
  21. Chatgpt passes german state examination in medicine with picture questions omitted. Deutsches Ärzteblatt International, 120(21-22), 373.
  22. Is it the new google: Impact of chatgpt on students’ information search habits. In the 22nd European Conference on e-Learning ECEL 2023, Hosted by the University of South Africa, 26-27 October 2023.
  23. Chatgpt: Jack of all trades, master of none. Information Fusion (p. 101861).
  24. Characterizing the online learning landscape: What and how people learn online. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW1), 1–19.
  25. Kubś, J. (2021). Historical narratives in different language versions of wikipedia. Academic Journal of Modern Philology, (12), 83–94.
  26. Why the world reads wikipedia: Beyond english speakers. In Proceedings of the twelfth ACM international conference on web search and data mining (pp. 618–626).
  27. Chatgpt in healthcare: a taxonomy and systematic review. Computer Methods and Programs in Biomedicine (p. 108013).
  28. Chatgpt and bard exhibit spontaneous citation fabrication during psychiatry literature search. Psychiatry Research, 326, 115334.
  29. A culturally sensitive test to evaluate nuanced gpt hallucination. IEEE Transactions on Artificial Intelligence.
  30. The substantial interdependence of wikipedia and google: A case study on the relationship between peer production communities and information technologies. In Proceedings of the International AAAI Conference on Web and Social Media, Volume 11 (pp. 142–151).
  31. Wikipedia culture gap: quantifying content imbalances across 40 language editions. Frontiers in Physics, 6, 54.
  32. A season for all things: Phenological imprints in wikipedia usage and their relevance to conservation. PLoS biology, 17(3), e3000146.
  33. What is trending on wikipedia? capturing trends and language biases across wikipedia editions. In Companion Proceedings of the Web Conference 2020 (pp. 794–801).
  34. Improving wikipedia verifiability with ai. Nature Machine Intelligence, 5(10), 1142–1148.
  35. A large-scale characterization of how readers browse wikipedia. ACM Transactions on the Web, 17(2), 1–22.
  36. Quantifying engagement with citations on wikipedia. In Proceedings of The Web Conference 2020 (pp. 2365–2376).
  37. On the value of wikipedia as a gateway to the web. In Proceedings of the Web Conference 2021 (pp. 249–260).
  38. Ray, P. P. (2023). Chatgpt: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems.
  39. How did they build the free encyclopedia? a literature review of collaboration and coordination among wikipedia editors. ACM Transactions on Computer-Human Interaction, 31(1), 1–48.
  40. Are large language models a threat to digital public goods? evidence from activity on stack overflow. arXiv preprint arXiv:2307.07367.
  41. Understanding wikipedia as a resource for opportunistic learning of computing concepts. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education (pp. 72–78).
  42. Chatgpt mt: Competitive for high-(but not low-) resource languages. In Proceedings of the Eighth Conference on Machine Translation (pp. 392–418).
  43. Information foraging in the era of ai: Exploring the effect of chatgpt on digital q&a platforms. Available at SSRN 4459729.
  44. Temporal patterns of scientific information-seeking on google and wikipedia. Public understanding of science, 26(8), 969–985.
  45. Why we read wikipedia. In Proceedings of the 26th international conference on world wide web (pp. 1591–1600).
  46. Keeping community in the loop: Understanding wikipedia stakeholder values for machine learning-based systems. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (pp. 1–14).
  47. Smith, D. A. (2020). Situating wikipedia as a health information resource in various contexts: A scoping review. PloS one, 15(2), e0228786.
  48. Taecharungroj, V. (2023). “what can chatgpt do?” analyzing early reactions to the innovative ai chatbot on twitter. Big Data and Cognitive Computing, 7(1), 35.
  49. Effects of algorithmic flagging on fairness: quasi-experimental evidence from wikipedia. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW1), 1–27.
  50. Even good bots fight: The case of wikipedia. PloS one, 12(2), e0171774.
  51. Black lives matter in wikipedia: Collective memory and collaboration around online social movements. In Proceedings of the 2017 acm conference on computer supported cooperative work and social computing (pp. 1400–1412).
  52. Tracking knowledge propagation across wikipedia languages. In Proceedings of the International AAAI Conference on Web and Social Media, Volume 15 (pp. 1046–1052).
  53. A deeper investigation of the importance of wikipedia links to search engine results. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW1), 1–15.
  54. Examining wikipedia with a broader lens: Quantifying the value of wikipedia’s relationships with other large-scale online communities. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (pp. 1–13).
  55. Towards improving the reliability and transparency of chatgpt for educational question answering. In European Conference on Technology Enhanced Learning (pp. 475–488).
  56. Chatgpt vs. google: a comparative study of search performance and user experience. arXiv preprint arXiv:2307.01135.
  57. M3exam: A multilingual, multimodal, multilevel benchmark for examining large language models. Advances in Neural Information Processing Systems, 36.
  58. Don’t trust chatgpt when your question is not in english: A study of multilingual abilities and types of llms. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (pp. 7915–7927).
  59. The roles bots play in wikipedia. Proceedings of the ACM on Human-Computer Interaction, 3(CSCW), 1–20.
  60. Stigmergy in open collaboration: An empirical investigation based on wikipedia. Journal of Management Information Systems, 40(3), 983–1008.
  61. Chatgpt hallucinates when attributing answers. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region (pp. 46–51).

Summary

  • The paper reveals that ChatGPT's release is associated with a 10-18% increase in page views across diverse Wikipedia language editions.
  • It uses paired statistical tests and panel regression models to isolate trends from pre- and post-ChatGPT release periods.
  • The study highlights that while passive consumption on Wikipedia increased, collaborative editing activities remained largely unaffected.

The Impact of ChatGPT on Wikipedia Engagement: A Summary of Findings

The introduction of ChatGPT has sparked substantial discussions regarding its potential impact on traditional knowledge-sharing platforms. In their paper, "The Death of Wikipedia? - Exploring the Impact of ChatGPT on Wikipedia Engagement," Neal Reeves, Wenjie Yin, Elena Simperl, and Miriam Redi from King's College London explore whether ChatGPT's public release has impacted Wikipedia user metrics. Their analysis focused on page views, unique visitors, edit counts, and editor numbers across twelve language editions of Wikipedia, examining activity trends before and after ChatGPT's launch. This essay summarizes their methodology, findings, and the potential implications for the wider ecosystem of AI and collaborative intelligence platforms.

Methodology and Data Collection

The researchers drew data from the Wikipedia API for a period spanning from January 1, 2021, to January 1, 2024, covering both pre- and post-ChatGPT release phases. They selected twelve languages ensuring geographic diversity and different degrees of representation in ChatGPT's training data. This selection aimed to contrast highly resourced languages with significant Wikipedia presence and large common crawl sizes against less represented languages with smaller Wikipedia footprints. Additionally, six of these languages are spoken in countries where ChatGPT was unavailable at the time of this paper.

Analytical Approach

The paper's methodological approach involved paired statistical tests for aggregated statistics and a panel regression model to quantify longitudinal trends. The panel regression with fixed effects allowed controlling for day and week-specific effects, creating a more precise analytical framework to discern fluctuations potentially attributable to ChatGPT's introduction.

Key Findings

Page Views and Visitor Numbers

A critical insight from the paper is the apparent rise in page views and visitor numbers across most languages following ChatGPT's release. For languages where ChatGPT was available, page views increased by approximately 10-18%, with Arabic showing the most significant increase. However, even more substantial rises were observed in languages from countries where ChatGPT was unavailable, suggesting that the presence of ChatGPT may have moderated growth in Wikipedia engagement rather than reversing it. This pattern was similarly observed with visitor numbers, further emphasizing the mitigated growth hypothesis.

Edit Counts and Editor Numbers

In contrast, changes in editing behaviors were less pronounced. While Arabic and Urdu exhibited significant changes with Arabic experiencing a rise and Urdu a fall in edits, most languages did not show statistically significant trends in edits or editor numbers. This muted impact might be attributed to the ingrained collaborative and social aspects of Wikipedia editing, which are less likely to be influenced by new AI tools compared to passive consumption behaviors like page viewing.

Implications and Future Work

Reeves et al.'s findings suggest that while ChatGPT might compete with Wikipedia for user attention, any perceived competition does not equate to a significant reduction in Wikipedia's role as a vital information resource. Instead, ChatGPT might be supplementing Wikipedia, especially in languages with smaller Wikipedia communities. This interplay between Wikipedia and ChatGPT underscores the complex interdependencies within the ecosystem of knowledge-sharing platforms.

The implications of this research are manifold. Practically, these findings can inform strategies for managing Wikipedia's volunteer base and engagement tactics in an era where AI tools continue to evolve. Theoretically, the paper contributes to understanding how AI-driven tools interact with established collaborative intelligence systems, offering a nuanced view of coexistence rather than direct competition.

Future Research Directions

  1. Longitudinal Studies: Extending the timeline of analysis to capture longer-term trends and more comprehensive post-ChatGPT data can provide deeper insights into sustained behavioral changes.
  2. Granular Analysis: Investigating the specific categories of Wikipedia articles that might be more or less impacted by AI-driven information tools.
  3. Cross-Platform Comparisons: Expanding research to include other collaborative platforms like Reddit, Quora, or specialized forums to understand broader shifts in information-seeking behaviors.
  4. Community Dynamics: Exploring the social dynamics within Wikipedia that might buffer against impacts from AI tools on editing behavior.

Conclusion

This paper by Reeves et al. provides substantive evidence that ChatGPT has not led to a significant reduction in Wikipedia engagement. Instead, it suggests nuanced dynamics where new AI tools might supplement or moderate engagement growth in traditional knowledge platforms. As the landscape of online information-seeking continues to evolve with AI advancements, understanding these interrelations will become increasingly critical for both academic researchers and platform administrators looking to sustain vibrant, user-driven ecosystems.