Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation (2402.13211v2)

Published 20 Feb 2024 in cs.CL

Abstract: Emotional Support Conversation (ESC) is a task aimed at alleviating individuals' emotional distress through daily conversation. Given its inherent complexity and non-intuitive nature, ESConv dataset incorporates support strategies to facilitate the generation of appropriate responses. Recently, despite the remarkable conversational ability of LLMs, previous studies have suggested that they often struggle with providing useful emotional support. Hence, this work initially analyzes the results of LLMs on ESConv, revealing challenges in selecting the correct strategy and a notable preference for a specific strategy. Motivated by these, we explore the impact of the inherent preference in LLMs on providing emotional support, and consequently, we observe that exhibiting high preference for specific strategies hinders effective emotional support, aggravating its robustness in predicting the appropriate strategy. Moreover, we conduct a methodological study to offer insights into the necessary approaches for LLMs to serve as proficient emotional supporters. Our findings emphasize that (1) low preference for specific strategies hinders the progress of emotional support, (2) external assistance helps reduce preference bias, and (3) existing LLMs alone cannot become good emotional supporters. These insights suggest promising avenues for future research to enhance the emotional intelligence of LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Gordon willard allport: The nature of prejudice. Samuel Salzborn (Hg.): Klassiker der Sozialwissenschaften, 100:193–197.
  2. Satanjeev Banerjee and Alon Lavie. 2005. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In IEEvaluation@ACL.
  3. Ralph Allan Bradley and Milton E Terry. 1952. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324–345.
  4. Brant R Burleson. 2003. Emotional support skill. In Handbook of Communication and Social Interaction Skills, page 551. Psychology Press.
  5. Dialogue chain-of-thought distillation for commonsense-aware conversational agents. arXiv preprint arXiv:2310.09343.
  6. Controllable mixed-initiative dialogue generation through prompting. In Annual Meeting of the Association for Computational Linguistics.
  7. Soulchat: Improving llms’ empathy, listening, and comfort abilities through fine-tuning with multi-turn empathy conversations.
  8. Pal: Persona-augmented emotional support conversation generation. In ACL.
  9. Improving multi-turn emotional support dialogue generation with lookahead strategy planning. In Conference on Empirical Methods in Natural Language Processing.
  10. Challenges of large language models for mental health counseling. arXiv preprint arXiv:2311.13857.
  11. Knowledge-enhanced mixed-initiative dialogue system for emotional support conversations. In ACL.
  12. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314.
  13. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT.
  14. Mahshid Eshghie and Mojtaba Eshghie. 2023. Chatgpt as a therapist assistant: A suitability study. arXiv preprint arXiv:2304.09873.
  15. Faiza Farhat. 2023. Chatgpt as a complementary mental health resource: a boon or a bane. Annals of Biomedical Engineering, pages 1–4.
  16. Leveraging large language models in conversational recommender systems.
  17. Ream♯♯\sharp♯: An enhancement approach to reference-based evaluation metrics for open-domain dialog generation.
  18. ComFact: A benchmark for linking contextual commonsense knowledge. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 1656–1675, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  19. Jennifer C Greene. 2003. Handbook of Communication and Social Interaction Skills. Psychology Press.
  20. Catherine A Heaney and Barbara A Israel. 2008. Social networks and social support. 4:189–210.
  21. Clara E Hill. 2009. Helping Skills: Facilitating, Exploration, Insight, and Action. American Psychological Association.
  22. Comet-atomic 2020: On symbolic and neural commonsense knowledge graphs. In AAAI Conference on Artificial Intelligence.
  23. Camels in a changing climate: Enhancing lm adaptation with tulu 2.
  24. Rethinking large language models in mental health applications.
  25. Knowledge-enhanced memory model for emotional support conversation. arXiv preprint arXiv:2310.07700.
  26. Mistral 7b.
  27. Solar 10.7b: Scaling large language models with simple yet effective depth up-scaling.
  28. Social support: A conceptual analysis. Journal of Advanced Nursing, 25(1):95–100.
  29. Prompted LLMs as chatbot modules for long open-domain conversation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 4536–4554, Toronto, Canada. Association for Computational Linguistics.
  30. A diversity-promoting objective function for neural conversation models. In NAACL.
  31. Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Annual Meeting of the Association for Computational Linguistics.
  32. Towards emotional support dialog systems. In ACL.
  33. Roberta: A robustly optimized bert pretraining approach.
  34. Self-refine: Iterative refinement with self-feedback. ArXiv, abs/2303.17651.
  35. Shikib Mehri and Maxine Eskenazi. 2020. Usr: An unsupervised and reference free evaluation metric for dialog generation.
  36. M. E. J. Newman. 2023. Efficient computation of rankings from pairwise comparisons. Journal of Machine Learning Research, 24(238):1–25.
  37. OpenAI. 2023a. Chatgpt. https://openai.com/blog/chatgpt.
  38. OpenAI. 2023b. Gpt-4 technical report.
  39. Bleu: a method for automatic evaluation of machine translation. In Annual Meeting of the Association for Computational Linguistics.
  40. Direct preference optimization: Your language model is secretly a reward model.
  41. The typing cure: Experiences with large language model chatbots for mental health support.
  42. Llama 2: Open foundation and fine-tuned chat models. ArXiv, abs/2307.09288.
  43. Cider: Consensus-based image description evaluation. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4566–4575.
  44. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  45. Ernst Zermelo. 1929. Die berechnung der turnier-ergebnisse als ein maximumproblem der wahrscheinlichkeitsrechnung. Mathematische Zeitschrift, 29(1):436–460.
  46. Is chatgpt equipped with emotional dialogue capabilities? arXiv preprint arXiv:2304.09582.
  47. Transesc: Smoothing emotional support conversation via turn-level state transition. In Annual Meeting of the Association for Computational Linguistics.
  48. Judging llm-as-a-judge with mt-bench and chatbot arena.
  49. Building emotional support chatbots in the era of llms. ArXiv, abs/2308.11584.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Dongjin Kang (10 papers)
  2. Sunghwan Kim (28 papers)
  3. Taeyoon Kwon (12 papers)
  4. Seungjun Moon (8 papers)
  5. Hyunsouk Cho (11 papers)
  6. Youngjae Yu (72 papers)
  7. Dongha Lee (63 papers)
  8. Jinyoung Yeo (46 papers)
Citations (8)