Large Language Models Produce Responses Perceived to be Empathic (2403.18148v1)
Abstract: LLMs have demonstrated surprising performance on many tasks, including writing supportive messages that display empathy. Here, we had these models generate empathic messages in response to posts describing common life experiences, such as workplace situations, parenting, relationships, and other anxiety- and anger-eliciting situations. Across two studies (N=192, 202), we showed human raters a variety of responses written by several models (GPT4 Turbo, Llama2, and Mistral), and had people rate these responses on how empathic they seemed to be. We found that LLM-generated responses were consistently rated as more empathic than human-written responses. Linguistic analyses also show that these models write in distinct, predictable ``styles", in terms of their use of punctuation, emojis, and certain words. These results highlight the potential of using LLMs to enhance human peer support in contexts where empathy is important.
- J. Weizenbaum, “Eliza—a computer program for the study of natural language communication between man and machine,” Communications of the ACM, vol. 9, no. 1, pp. 36–45, 1966.
- B. Inkster, S. Sarda, V. Subramanian et al., “An empathy-driven, conversational artificial intelligence agent (wysa) for digital mental well-being: real-world data evaluation mixed-methods study,” JMIR mHealth and uHealth, vol. 6, no. 11, p. e12106, 2018.
- K. K. Fitzpatrick, A. Darcy, and M. Vierhile, “Delivering cognitive behavior therapy to young adults with symptoms of depression and anxiety using a fully automated conversational agent (woebot): a randomized controlled trial,” JMIR mental health, vol. 4, no. 2, p. e7785, 2017.
- S. Cohen and T. A. Wills, “Stress, social support, and the buffering hypothesis.” Psychological bulletin, vol. 98, no. 2, p. 310, 1985.
- M. Brenan. (2022) Americans’ reported mental health at new low; more seek help. Gallup. [Online]. Available: https://news.gallup.com/poll/467303/americans-reported-mental-health-new-low-seek-help.aspx
- T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901, 2020.
- J. W. Ayers, A. Poliak, M. Dredze, E. C. Leas, Z. Zhu, J. B. Kelley, D. J. Faix, A. M. Goodman, C. A. Longhurst, M. Hogarth et al., “Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum,” JAMA Internal Medicine, 2023.
- A. Sharma, K. Rushton, I. W. Lin, D. Wadden, K. G. Lucas, A. Miner, T. Nguyen, and T. Althoff, “Cognitive reframing of negative thoughts through human-language model interaction,” in The 61st Annual Meeting Of The Association For Computational Linguistics, 2023.
- V. Sorin, D. Brin, Y. Barash, E. Konen, A. Charney, G. Nadkarni, and E. Klang, “Large language models (llms) and empathy-a systematic review,” medRxiv, pp. 2023–08, 2023.
- T. Germain, “A mental health app tested ChatGPT on its users. the founder said backlash was just a misunderstanding,” Gizmodo, 2023.
- L. Bannon, “Can ai do empathy even better than humans? companies are trying it.” Wall Street Journal, 2023.
- D. Khullar, “Can a.i. treat mental illness?” The New Yorker, 2023.
- G. Kolata, “When doctors use a chatbot to improve their bedside manner.” The New York Times, 2023.
- C. Montemayor, J. Halpern, and A. Fairweather, “In principle obstacles for empathic ai: why we can’t replace human empathy in healthcare,” AI & Society, vol. 37, no. 4, pp. 1353–1359, 2022.
- A. Perry, “Ai will never convey the essence of human empathy,” Nature Human Behaviour, vol. 7, no. 11, pp. 1808–1809, 2023.
- M. Inzlicht, C. D. Cameron, J. D’Cruz, and P. Bloom, “In praise of empathic ai,” Trends in Cognitive Sciences, 2023.
- G. M. Lucas, J. Gratch, A. King, and L.-P. Morency, “It’s only a computer: Virtual humans increase willingness to disclose,” Computers in Human Behavior, vol. 37, pp. 94–100, 2014.
- H. Park and J. Lee, “Designing a conversational agent for sexual assault survivors: defining burden of self-disclosure and envisioning survivor-centered solutions,” in Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, 2021, pp. 1–17.
- A. Cuadra, M. Wang, L. A. Stein, M. F. Jung, N. Dell, D. Estrin, and J. A. Landay, “The illusion of empathy? notes on displays of emotion in human-computer interaction,” ACM Conference on Human Factors in Computing Systems (CHI), 2024.
- T. Tu, A. Palepu, M. Schaekermann, K. Saab, J. Freyberg, R. Tanno, A. Wang, B. Li, M. Amin, N. Tomasev et al., “Towards conversational diagnostic ai,” arXiv preprint arXiv:2401.05654, 2024.
- J. Zaki and K. N. Ochsner, “The neuroscience of empathy: progress, pitfalls and promise,” Nature Neuroscience, vol. 15, no. 5, pp. 675–680, 2012.
- K. Darling, P. Nandy, and C. Breazeal, “Empathic concern and the effect of stories in human-robot interaction,” in 2015 24th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN). IEEE, 2015, pp. 770–775.
- D. Demszky, D. Movshovitz-Attias, J. Ko, A. Cowen, G. Nemade, and S. Ravi, “Goemotions: A dataset of fine-grained emotions,” arXiv preprint arXiv:2005.00547, 2020.
- V. Suresh and D. C. Ong, “Not all negatives are equal: Label-aware contrastive loss for fine-grained text classification,” arXiv preprint arXiv:2109.05427, 2021.
- ——, “Using knowledge-embedded attention to augment pre-trained language models for fine-grained emotion recognition,” in 2021 9th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 2021, pp. 1–8.
- H. Rashkin, E. M. Smith, M. Li, and Y.-L. Boureau, “Towards empathetic open-domain conversation models: A new benchmark and dataset,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019, pp. 5370–5381.
- A. Sharma, A. Miner, D. Atkins, and T. Althoff, “A computational approach to understanding empathy expressed in text-based mental health support,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020, pp. 5263–5276.
- Y. K. Lee, Y. Jung, I. Lee, J. E. Park, and S. Hahn, “Building a psychological ground truth dataset with empathy and theory-of-mind during the covid-19 pandemic,” in Proceedings of the Annual Meeting of the Cognitive Science Society, vol. 43, no. 43, 2021.
- L. Zhou, J. Gao, D. Li, and H.-Y. Shum, “The design and implementation of xiaoice, an empathetic social chatbot,” Computational Linguistics, vol. 46, no. 1, pp. 53–93, 2020.
- A. N. Tak and J. Gratch, “Is gpt a computational model of emotion?” in 2023 11th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 2023, pp. 1–8.
- J. Broekens, B. Hilpert, S. Verberne, K. Baraka, P. Gebhard, and A. Plaat, “Fine-grained affective processing capabilities emerging from large language models,” in 2023 11th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 2023, pp. 1–8.
- H. Zhan, D. Ong, and J. J. Li, “Evaluating subjective cognitive appraisals of emotions from large language models,” in Findings of the Association for Computational Linguistics: EMNLP 2023, H. Bouamor, J. Pino, and K. Bali, Eds. Singapore: Association for Computational Linguistics, Dec. 2023, pp. 14 418–14 446. [Online]. Available: https://aclanthology.org/2023.findings-emnlp.962
- D. Demszky, D. Yang, D. S. Yeager, C. J. Bryan, M. Clapper, S. Chandhok, J. C. Eichstaedt, C. Hecht, J. Jamieson, M. Johnson et al., “Using large language models in psychology,” Nature Reviews Psychology, vol. 2, no. 11, pp. 688–701, 2023.
- H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale et al., “Llama 2: Open foundation and fine-tuned chat models,” arXiv preprint arXiv:2307.09288, 2023.
- A. Holtzman, J. Buys, L. Du, M. Forbes, and Y. Choi, “The curious case of neural text degeneration,” in International Conference on Learning Representations, 2019.
- J. W. Pennebaker, R. J. Booth, and M. E. Francis, “Linguistic inquiry and word count: Liwc [computer software],” Austin, TX: liwc. net, vol. 135, 2007.
- Y. R. Tausczik and J. W. Pennebaker, “The psychological meaning of words: Liwc and computerized text analysis methods,” Journal of Language and Social Psychology, vol. 29, no. 1, pp. 24–54, 2010.
- A. R. Sonnenschein, S. G. Hofmann, T. Ziegelmayer, and W. Lutz, “Linguistic analysis of patients with mood and anxiety disorders during cognitive behavioral therapy,” Cognitive Behaviour Therapy, vol. 47, no. 4, pp. 315–327, 2018.
- B. Kleinberg, I. Van Der Vegt, and M. Mozes, “Measuring emotions in the covid-19 real world worry dataset,” arXiv preprint arXiv:2004.04225, 2020.
- J. H. Shen and F. Rudzicz, “Detecting anxiety through reddit,” in Proceedings of the Fourth Workshop on Computational Linguistics and Clinical Psychology—From Linguistic Signal to Clinical Reality, 2017, pp. 58–65.
- P. Singhal, T. Goyal, J. Xu, and G. Durrett, “A long way to go: Investigating length correlations in rlhf,” 2023.
- R. Cowie, “The good our field can hope to do, the harm it should avoid,” IEEE Transactions on Affective Computing, vol. 3, no. 4, pp. 410–423, 2012.
- L. Devillers and R. Cowie, “Ethical considerations on affective computing: An overview,” Proceedings of the IEEE, 2023.
- S. Santurkar, E. Durmus, F. Ladhak, C. Lee, P. Liang, and T. Hashimoto, “Whose opinions do language models reflect?” in International Conference on Machine Learning. PMLR, 2023, pp. 29 971–30 004.
- Y. Tao, O. Viberg, R. S. Baker, and R. F. Kizilcec, “Auditing and mitigating cultural bias in llms,” arXiv preprint arXiv:2311.14096, 2023.
- Yoon Kyung Lee (9 papers)
- Jina Suh (29 papers)
- Hongli Zhan (6 papers)
- Junyi Jessy Li (79 papers)
- Desmond C. Ong (26 papers)