Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Empirical evidence of Large Language Model's influence on human spoken communication (2409.01754v1)

Published 3 Sep 2024 in cs.CY, cs.AI, cs.CL, and cs.HC

Abstract: AI agents now interact with billions of humans in natural language, thanks to advances in LLMs like ChatGPT. This raises the question of whether AI has the potential to shape a fundamental aspect of human culture: the way we speak. Recent analyses revealed that scientific publications already exhibit evidence of AI-specific language. But this evidence is inconclusive, since scientists may simply be using AI to copy-edit their writing. To explore whether AI has influenced human spoken communication, we transcribed and analyzed about 280,000 English-language videos of presentations, talks, and speeches from more than 20,000 YouTube channels of academic institutions. We find a significant shift in the trend of word usage specific to words distinctively associated with ChatGPT following its release. These findings provide the first empirical evidence that humans increasingly imitate LLMs in their spoken language. Our results raise societal and policy-relevant concerns about the potential of AI to unintentionally reduce linguistic diversity, or to be deliberately misused for mass manipulation. They also highlight the need for further investigation into the feedback loops between machine behavior and human culture.

Empirical Evidence of LLMs' Influence on Human Spoken Communication

The paper "Empirical Evidence of LLMs' Influence on Human Spoken Communication" by Yakura, Lopez-Lopez, Brinkmann, et al., explores the role of LLMs, such as ChatGPT, in shaping human language, particularly in academic spoken discourse. This paper is timely and significant given the increasing integration of AI in daily communicative interactions and the potential ramifications it holds for linguistic and cultural evolution.

Introduction

The paper begins by situating language as a dynamic social phenomenon that evolves through processes of perception, internalization, and reproduction. The introduction succinctly establishes the foundation for the research by referencing prior work indicating that emergent technologies historically influence language transmission. The authors position LLMs within this historical context, noting the extensive use of applications like ChatGPT for various writing tasks in academic settings. Highlighting the observable shift in linguistic patterns in texts edited by ChatGPT, the paper aims to ascertain whether these models similarly affect spoken academic communication.

Methods

The researchers focus on a corpus of approximately 280,000 transcriptions of English-language videos from over 20,000 academic YouTube channels. The temporal framework incorporates data from 36 months before the release of ChatGPT to 18 months after. This comprehensive dataset allows the authors to robustly analyze shifts in word usage frequencies distinctly associated with ChatGPT.

A continuous piecewise linear regression model is used to capture the temporal evolution of word frequency, incorporating a change point marked by the release of ChatGPT in November 2022. This rigorous analytical framework is designed to test the hypothesis that specific linguistic patterns introduced by LLMs are being adopted in human spoken language post-ChatGPT's release.

Results

The results section reveals statistically significant increases in the frequency of specific words distinctly associated with ChatGPT-edited texts. Words such as "delve," "realm," "meticulous," and "adept" saw increases of 48%, 35%, 40%, and 51%, respectively, over the observed period following ChatGPT’s release. These findings were corroborated by comparisons with alternative change points, which did not display comparable trend changes, underscoring the specificity of the observed trends to the post-ChatGPT period.

Further analysis establishes a strong correlation between the distinctiveness of words in ChatGPT-generated texts and their accelerated adoption in human spoken language. Notably, this accelerated adoption was prominent for the top 20 words most peculiar to ChatGPT, suggesting that highly distinctive LLM-generated language features are more likely to be assimilated into human speech.

Discussion

The discussion section contextualizes these findings within the broader discourse of AI's influence on human behavior and culture. The authors present a well-reasoned argument that LLMs are not merely passive tools but active agents influencing human linguistic patterns. This mirrors findings in other domains, such as strategic games, where humans have adopted machine-derived strategies.

The paper also contemplates the broader implications of these findings. It suggests potential risks such as the reduction of linguistic diversity and the possibility of LLMs being exploited for mass manipulation. Furthermore, it highlights the necessity for continuous monitoring and examination of the bidirectional influence between humans and AI.

Implications and Future Directions

The research presents critical insights with practical and theoretical implications. Practically, it underscores the importance of developing policies to manage the influence of AI on human communication. Theoretically, it opens avenues for further research into the feedback loops between AI systems and human culture. Future research could explore the mechanisms driving the accelerated adoption of certain words and evaluate the generalizability of these findings across different communication contexts.

Conclusion

This paper makes significant contributions to understanding the impact of LLMs on human language. The empirical evidence provided illustrates a notable shift in spoken academic communication following the introduction of ChatGPT. By rigorously analyzing a vast dataset and employing a robust statistical approach, the authors present compelling evidence that AI is increasingly shaping human linguistic patterns. Moving forward, it is imperative to consider both the promising and challenging aspects of this evolving relationship between humans and AI in shared cultural environments.

Methods Appendix

The dataset construction and transcription methodology employed by the authors are comprehensive, ensuring the reliability of the data used in the paper. Additionally, the use of Bayesian Gaussian regression models enhances the robustness of the findings. The sensitivity analysis further validates the specificity of the observed trends to the influence of ChatGPT, enhancing the credibility of the paper’s conclusions.

By examining a broad spectrum of words and their adoption post-ChatGPT, the paper effectively addresses the initial research questions and sets a foundational framework for future explorations into AI’s influence on human language and culture.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. Toward a mechanistic psychology of dialogue. Behavioral and Brain Sciences, 27(2), 2004.
  2. Communication accommodation theory. In Theorizing about Intercultural Communication, pages 121–148. Sage, Thousand Oaks, USA, 2005.
  3. Penelope Eckert. Linguistic variation as social practice. Blackwell, Oxford, UK, 2000.
  4. Language acquisition meets language evolution. Cognitive Science, 34(7):1131–1157, 2010.
  5. David Crystal. The language revolution. Polity, Cambridge, UK, 2004.
  6. Chris Stokel-Walker. ChatGPT listed as author on research papers. Nature, 613(7945):620–621, 2023.
  7. Analysing the impact of ChatGPT in research. Applied Intelligence, 54(5):4172–4188, 2024.
  8. Mapping the increasing use of LLMs in scientific papers. In Proc. CoLM, pages 1–27, Amherst, USA, 2024. OpenReview.
  9. Is ChatGPT transforming academics’ writing style? In Proc. ICML NextGenAISafety Workshop, pages 1–14, Amherst, USA, 2024. OpenReview.
  10. Delving into ChatGPT usage in academic writing through excess vocabulary. arXiv, 2406.07016:1–13, 2024.
  11. Word embeddings quantify 100 years of gender and ethnic stereotypes. Proceedings of the National Academy of Sciences, 115(16):E3635–E3644, 2018.
  12. Large language models show human-like content biases in transmission chain experiments. Proceedings of the National Academy of Sciences, 120(44):e2313790120, 2023.
  13. Machine culture. Nature Human Behaviour, 7(11):1855–1868, 2023.
  14. Superhuman artificial intelligence can improve human decision-making by increasing novelty. Proceedings of the National Academy of Sciences, 120(12):e2214840120, 2023.
  15. The curious decline of linguistic diversity. In Findings of ACL NAACL, pages 3589–3604, Kerrville, USA, 2024. ACL.
  16. Research Organization Registry. ROR Data, 2024. https://doi.org/10.5281/zenodo.11186879.
  17. Shuyo Nakatani. Language detection library for Java, 2010. https://www.slideshare.net/slideshow/language-detection-library-for-java/6014274 (Accessed on July 31, 2024).
  18. Robust speech recognition via large-scale weak supervision. arXiv, 2212.04356:1–28, 2022.
  19. M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130–137, 1980.
  20. Introduction to Information Retrieval. Cambridge University Press, Cambridge, UK, 2008.
  21. R. Harald Baayen. Word Frequency Distributions, volume 18 of Text, Speech and Language Technology. Springer Netherlands, Dordrecht, Netherlands, 2001.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Hiromu Yakura (19 papers)
  2. Ezequiel Lopez-Lopez (1 paper)
  3. Levin Brinkmann (7 papers)
  4. Ignacio Serna (17 papers)
  5. Prateek Gupta (40 papers)
  6. Iyad Rahwan (56 papers)
Citations (3)
Youtube Logo Streamline Icon: https://streamlinehq.com