Empirical evidence of Large Language Model's influence on human spoken communication

Published 3 Sep 2024 in cs.CY, cs.AI, cs.CL, and cs.HC | (2409.01754v1)

Abstract: AI agents now interact with billions of humans in natural language, thanks to advances in LLMs like ChatGPT. This raises the question of whether AI has the potential to shape a fundamental aspect of human culture: the way we speak. Recent analyses revealed that scientific publications already exhibit evidence of AI-specific language. But this evidence is inconclusive, since scientists may simply be using AI to copy-edit their writing. To explore whether AI has influenced human spoken communication, we transcribed and analyzed about 280,000 English-language videos of presentations, talks, and speeches from more than 20,000 YouTube channels of academic institutions. We find a significant shift in the trend of word usage specific to words distinctively associated with ChatGPT following its release. These findings provide the first empirical evidence that humans increasingly imitate LLMs in their spoken language. Our results raise societal and policy-relevant concerns about the potential of AI to unintentionally reduce linguistic diversity, or to be deliberately misused for mass manipulation. They also highlight the need for further investigation into the feedback loops between machine behavior and human culture.

Abstract PDF HTML Upgrade to Chat

References (21)

Citations (3)

View on Semantic Scholar

Summary

The paper demonstrates that LLMs significantly influence human academic speech patterns through distinct word frequency shifts.
A robust piecewise linear regression on 280,000 transcripts revealed post-ChatGPT increases of 35%-51% in key words.
Results underscore the potential risks of diminished linguistic diversity and highlight the need for proactive policy measures.

Empirical Evidence of LLMs' Influence on Human Spoken Communication

The paper "Empirical Evidence of LLMs' Influence on Human Spoken Communication" by Yakura, Lopez-Lopez, Brinkmann, et al., explores the role of LLMs, such as ChatGPT, in shaping human language, particularly in academic spoken discourse. This study is timely and significant given the increasing integration of AI in daily communicative interactions and the potential ramifications it holds for linguistic and cultural evolution.

Introduction

The study begins by situating language as a dynamic social phenomenon that evolves through processes of perception, internalization, and reproduction. The introduction succinctly establishes the foundation for the research by referencing prior work indicating that emergent technologies historically influence language transmission. The authors position LLMs within this historical context, noting the extensive use of applications like ChatGPT for various writing tasks in academic settings. Highlighting the observable shift in linguistic patterns in texts edited by ChatGPT, the study aims to ascertain whether these models similarly affect spoken academic communication.

Methods

The researchers focus on a corpus of approximately 280,000 transcriptions of English-language videos from over 20,000 academic YouTube channels. The temporal framework incorporates data from 36 months before the release of ChatGPT to 18 months after. This comprehensive dataset allows the authors to robustly analyze shifts in word usage frequencies distinctly associated with ChatGPT.

A continuous piecewise linear regression model is used to capture the temporal evolution of word frequency, incorporating a change point marked by the release of ChatGPT in November 2022. This rigorous analytical framework is designed to test the hypothesis that specific linguistic patterns introduced by LLMs are being adopted in human spoken language post-ChatGPT's release.

Results

The results section reveals statistically significant increases in the frequency of specific words distinctly associated with ChatGPT-edited texts. Words such as "delve," "realm," "meticulous," and "adept" saw increases of 48%, 35%, 40%, and 51%, respectively, over the observed period following ChatGPT’s release. These findings were corroborated by comparisons with alternative change points, which did not display comparable trend changes, underscoring the specificity of the observed trends to the post-ChatGPT period.

Further analysis establishes a strong correlation between the distinctiveness of words in ChatGPT-generated texts and their accelerated adoption in human spoken language. Notably, this accelerated adoption was prominent for the top 20 words most peculiar to ChatGPT, suggesting that highly distinctive LLM-generated language features are more likely to be assimilated into human speech.

Discussion

The discussion section contextualizes these findings within the broader discourse of AI's influence on human behavior and culture. The authors present a well-reasoned argument that LLMs are not merely passive tools but active agents influencing human linguistic patterns. This mirrors findings in other domains, such as strategic games, where humans have adopted machine-derived strategies.

The paper also contemplates the broader implications of these findings. It suggests potential risks such as the reduction of linguistic diversity and the possibility of LLMs being exploited for mass manipulation. Furthermore, it highlights the necessity for continuous monitoring and examination of the bidirectional influence between humans and AI.

Implications and Future Directions

The research presents critical insights with practical and theoretical implications. Practically, it underscores the importance of developing policies to manage the influence of AI on human communication. Theoretically, it opens avenues for further research into the feedback loops between AI systems and human culture. Future research could explore the mechanisms driving the accelerated adoption of certain words and evaluate the generalizability of these findings across different communication contexts.

Conclusion

This paper makes significant contributions to understanding the impact of LLMs on human language. The empirical evidence provided illustrates a notable shift in spoken academic communication following the introduction of ChatGPT. By rigorously analyzing a vast dataset and employing a robust statistical approach, the authors present compelling evidence that AI is increasingly shaping human linguistic patterns. Moving forward, it is imperative to consider both the promising and challenging aspects of this evolving relationship between humans and AI in shared cultural environments.

Methods Appendix

The dataset construction and transcription methodology employed by the authors are comprehensive, ensuring the reliability of the data used in the study. Additionally, the use of Bayesian Gaussian regression models enhances the robustness of the findings. The sensitivity analysis further validates the specificity of the observed trends to the influence of ChatGPT, enhancing the credibility of the study’s conclusions.

By examining a broad spectrum of words and their adoption post-ChatGPT, the study effectively addresses the initial research questions and sets a foundational framework for future explorations into AI’s influence on human language and culture.

Markdown