Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Linguistic Comparison between Human and ChatGPT-Generated Conversations (2401.16587v3)

Published 29 Jan 2024 in cs.CL, cs.AI, and cs.CY

Abstract: This study explores linguistic differences between human and LLM-generated dialogues, using 19.5K dialogues generated by ChatGPT-3.5 as a companion to the EmpathicDialogues dataset. The research employs Linguistic Inquiry and Word Count (LIWC) analysis, comparing ChatGPT-generated conversations with human conversations across 118 linguistic categories. Results show greater variability and authenticity in human dialogues, but ChatGPT excels in categories such as social processes, analytical style, cognition, attentional focus, and positive emotional tone, reinforcing recent findings of LLMs being "more human than human." However, no significant difference was found in positive or negative affect between ChatGPT and human dialogues. Classifier analysis of dialogue embeddings indicates implicit coding of the valence of affect despite no explicit mention of affect in the conversations. The research also contributes a novel, companion ChatGPT-generated dataset of conversations between two independent chatbots, which were designed to replicate a corpus of human conversations available for open access and used widely in AI research on LLMing. Our findings enhance understanding of ChatGPT's linguistic capabilities and inform ongoing efforts to distinguish between human and LLM-generated text, which is critical in detecting AI-generated fakes, misinformation, and disinformation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. An overview of chatbot technology. In IFIP International Conference on Artificial Intelligence Applications and Innovations (2020), Springer, pp. 373–383.
  2. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Internal Medicine (2023).
  3. The development and psychometric properties of liwc-22. University of Texas at Austin, 2022.
  4. Natural language analysis and the psychology of verbal behavior: The past, present, and future states of the field. Journal of Language and Social Psychology 40, 1 (2021), 21–41.
  5. The psychological functions of function words. In Social Communication. Psychology Press, 2011, pp. 343–359.
  6. Revealing dimensions of thinking in open-ended self-descriptions: An automated meaning extraction method for natural language. Journal of Research in Personality 42, 1 (2008), 96–132.
  7. Linguistic markers of psychological change surrounding September 11, 2001. Psychological science 15, 10 (2004), 687–693.
  8. Towards possibilities & impossibilities of AI-generated text detection: A survey. arXiv preprint arXiv:2310.15264 (2023).
  9. Human heuristics for AI-generated language are flawed. Proceedings of the National Academy of Sciences 120, 11 (2023), e2208839120.
  10. Examining long-term trends in politics and culture through language of political leaders and cultural institutions. Proceedings of the National Academy of Sciences 116, 9 (2019), 3476–3481.
  11. Pronoun use reflects standings in social hierarchies. Journal of Language and Social Psychology 33, 2 (2014), 125–143.
  12. Markowitz, D. M. Self-presentation in medicine: How language patterns reflect physician impression management goals and affect perceptions. Computers in Human Behavior 143 (2023), 107684.
  13. Linguistic markers of inherently false AI communication and intentionally false human communication: Evidence from hotel reviews. Journal of Language and Social Psychology 43, 1 (2024), 63–82.
  14. Authentic first impressions relate to interpersonal, social, and entrepreneurial success. Social Psychological and Personality Science 14, 2 (2023), 107–116.
  15. Umap: Uniform manifold approximation and projection. Journal of Open Source Software 3, 29 (2018), 861.
  16. Emotional tone, analytical thinking, and somatosensory processes of a sample of italian tweets during the first phases of the covid-19 pandemic: Observational study. Journal of Medical Internet Research 23, 10 (2021), e29820.
  17. Lying words: Predicting deception from linguistic styles. Personality and social psychology bulletin 29, 5 (2003), 665–675.
  18. Pennebaker, J. W. Mind mapping: Using everyday language to explore social & psychological processes. Procedia computer science 118 (2017), 100–107.
  19. The development and psychometric properties of LIWC2015. Tech. rep., 2015.
  20. Towards empathetic open-domain conversation models: A new benchmark and dataset. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (Florence, Italy, July 2019), Association for Computational Linguistics, pp. 5370–5381.
  21. Trait and state authenticity across cultures. Journal of Cross-Cultural Psychology 45, 9 (2014), 1347–1373.
  22. The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology 29, 1 (2010), 24–54.
  23. Characterizing empathy and compassion using computational linguistic analysis. Emotion (2023).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Morgan Sandler (4 papers)
  2. Hyesun Choung (3 papers)
  3. Arun Ross (64 papers)
  4. Prabu David (4 papers)
Citations (3)
X Twitter Logo Streamline Icon: https://streamlinehq.com