Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Multimodal Emotional Support Conversation Systems (2408.03650v2)

Published 7 Aug 2024 in cs.MM

Abstract: The integration of conversational AI into mental health care promises a new horizon for therapist-client interactions, aiming to closely emulate the depth and nuance of human conversations. Despite the potential, the current landscape of conversational AI is markedly limited by its reliance on single-modal data, constraining the systems' ability to empathize and provide effective emotional support. This limitation stems from a paucity of resources that encapsulate the multimodal nature of human communication essential for therapeutic counseling. To address this gap, we introduce the Multimodal Emotional Support Conversation (MESC) dataset, a first-of-its-kind resource enriched with comprehensive annotations across text, audio, and video modalities. This dataset captures the intricate interplay of user emotions, system strategies, system emotion, and system responses, setting a new precedent in the field. Leveraging the MESC dataset, we propose a general Sequential Multimodal Emotional Support framework (SMES) grounded in Therapeutic Skills Theory. Tailored for multimodal dialogue systems, the SMES framework incorporates an LLM-based reasoning model that sequentially generates user emotion recognition, system strategy prediction, system emotion prediction, and response generation. Our rigorous evaluations demonstrate that this framework significantly enhances the capability of AI systems to mimic therapist behaviors with heightened empathy and strategic responsiveness. By integrating multimodal data in this innovative manner, we bridge the critical gap between emotion recognition and emotional support, marking a significant advancement in conversational AI for mental health support.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. T. Dingler, D. Kwasnicka, J. Wei, E. Gong, and B. Oldenburg, “The use and promise of conversational agents in digital health,” Yearbook of Medical Informatics, vol. 30, no. 01, pp. 191–199, 2021.
  2. A. I. Jabir, L. Martinengo, X. Lin, J. Torous, M. Subramaniam, and L. T. Car, “Evaluating conversational agents for mental health: Scoping review of outcomes and outcome measurement instruments,” Journal of Medical Internet Research, vol. 25, no. 1, p. e44548, 2023.
  3. H. Li, R. Zhang, Y.-C. Lee, R. E. Kraut, and D. C. Mohr, “Systematic review and meta-analysis of ai-based conversational agents for promoting mental health and well-being,” NPJ Digital Medicine, vol. 6, no. 1, p. 236, 2023.
  4. J. Torous, S. Bucci, I. H. Bell, L. V. Kessing, M. Faurholt-Jepsen, P. Whelan, A. F. Carvalho, M. Keshavan, J. Linardon, and J. Firth, “The growing field of digital psychiatry: current evidence and the future of apps, social media, chatbots, and virtual reality,” World Psychiatry, vol. 20, no. 3, pp. 318–335, 2021.
  5. S. Liu, C. Zheng, O. Demasi, S. Sabour, Y. Li, Z. Yu, Y. Jiang, and M. Huang, “Towards emotional support dialog systems,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, 2021, pp. 3469–3483.
  6. H. Rashkin, E. M. Smith, M. Li, and Y.-L. Boureau, “Towards empathetic open-domain conversation models: A new benchmark and dataset,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Jul. 2019.
  7. Y. Li, H. Su, X. Shen, W. Li, Z. Cao, and S. Niu, “Dailydialog: A manually labelled multi-turn dialogue dataset,” in Proceedings of the Eighth International Joint Conference on Natural Language Processing, IJCNLP 2017, Taipei, Taiwan, November 27 - December 1, 2017 - Volume 1: Long Papers.   Asian Federation of Natural Language Processing, 2017, pp. 986–995.
  8. C. Hsu, S. Chen, C. Kuo, T. K. Huang, and L. Ku, “Emotionlines: An emotion corpus of multi-party conversations,” in Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018.   European Language Resources Association (ELRA), 2018.
  9. S. M. Zahiri and J. D. Choi, “Emotion detection on tv show transcripts with sequence-based convolutional neural networks,” in Workshops at the thirty-second aaai conference on artificial intelligence, 2018.
  10. S. Poria, D. Hazarika, N. Majumder, G. Naik, E. Cambria, and R. Mihalcea, “MELD: A multimodal multi-party dataset for emotion recognition in conversations,” in Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, 2019, pp. 527–536.
  11. C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J. N. Chang, S. Lee, and S. S. Narayanan, “Iemocap: Interactive emotional dyadic motion capture database,” Language resources and evaluation, vol. 42, pp. 335–359, 2008.
  12. J. Li, X. Wang, G. Lv, and Z. Zeng, “Graphcfc: A directed graph based cross-modal feature complementation approach for multimodal conversational emotion recognition,” IEEE Transactions on Multimedia, vol. 26, pp. 77–89, 2023.
  13. W. Nie, R. Chang, M. Ren, Y. Su, and A. Liu, “I-gcn: Incremental graph convolution network for conversation emotion detection,” IEEE Transactions on Multimedia, vol. 24, pp. 4471–4481, 2021.
  14. H. Ma, J. Wang, H. Lin, B. Zhang, Y. Zhang, and B. Xu, “A transformer-based model with self-distillation for multimodal emotion recognition in conversations,” IEEE Transactions on Multimedia, 2023.
  15. Q. Tu, Y. Li, J. Cui, B. Wang, J. Wen, and R. Yan, “MISC: A mixed strategy-aware model integrating COMET for emotional support conversation,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022.   Association for Computational Linguistics, 2022, pp. 308–319.
  16. W. Peng, Y. Hu, L. Xing, Y. Xie, Y. Sun, and Y. Li, “Control globally, understand locally: A global-to-local hierarchical graph network for emotional support conversation,” in Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022.   ijcai.org, 2022, pp. 4324–4330.
  17. J. Cheng, S. Sabour, H. Sun, Z. Chen, and M. Huang, “PAL: persona-augmented emotional support conversation generation,” in Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023.   Association for Computational Linguistics, 2023, pp. 535–554.
  18. J. N. Acosta, G. J. Falcone, P. Rajpurkar, and E. J. Topol, “Multimodal biomedical ai,” Nature Medicine, vol. 28, no. 9, pp. 1773–1784, 2022.
  19. M. Anvari, C. E. Hill, and D. M. Kivlighan, “Therapist skills associated with client emotional expression in psychodynamic psychotherapy,” Psychotherapy Research, vol. 30, no. 7, pp. 900–911, 2020.
  20. R. Elliott, “Helpful and nonhelpful events in brief counseling interviews: An empirical taxonomy.” Journal of Counseling Psychology, vol. 32, no. 3, p. 307, 1985.
  21. C. E. Hill, J. E. Helms, V. Tichenor, S. B. Spiegel, K. E. O’Grady, and E. S. Perry, “Effects of therapist response modes in brief psychotherapy.” 2001.
  22. J. L. Fosshage, “Psychoanalysis and psychoanalytic psychotherapy: Is there a meaningful distinction in the process?” Psychoanalytic Psychology, vol. 14, no. 3, p. 409, 1997.
  23. A. Sharma, A. S. Miner, D. C. Atkins, and T. Althoff, “A computational approach to understanding empathy expressed in text-based mental health support,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020.   Association for Computational Linguistics, 2020, pp. 5263–5276.
  24. M. Hosseini and C. Caragea, “It takes two to empathize: One to seek and one to provide,” in Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021.   AAAI Press, 2021, pp. 13 018–13 026.
  25. D. Ghosal, N. Majumder, A. Gelbukh, R. Mihalcea, and S. Poria, “COSMIC: COmmonSense knowledge for eMotion identification in conversations,” in Findings of the Association for Computational Linguistics: EMNLP 2020, Nov. 2020.
  26. J. Li, Z. Lin, P. Fu, and W. Wang, “Past, present, and future: Conversational emotion recognition through structural modeling of psychological knowledge,” in Findings of the association for computational linguistics: EMNLP 2021, 2021, pp. 1204–1214.
  27. D. Hu, L. Wei, and X. Huai, “DialogueCRN: Contextual reasoning networks for emotion recognition in conversations,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Aug. 2021.
  28. D. Hu, Y. Bao, L. Wei, W. Zhou, and S. Hu, “Supervised adversarial contrastive learning for emotion recognition in conversations,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul. 2023.
  29. D. Zhang, F. Chen, J. Chang, X. Chen, and Q. Tian, “Structure aware multi-graph network for multi-modal emotion recognition in conversations,” IEEE Transactions on Multimedia, 2023.
  30. M. Ren, X. Huang, W. Li, D. Song, and W. Nie, “Lr-gcn: Latent relation-aware graph convolutional network for conversational emotion recognition,” IEEE Transactions on Multimedia, vol. 24, pp. 4422–4432, 2021.
  31. M. Huang, X. Zhu, and J. Gao, “Challenges in building intelligent open-domain dialog systems,” ACM Transactions on Information Systems (TOIS), vol. 38, no. 3, pp. 1–32, 2020.
  32. H. Zhou, M. Huang, T. Zhang, X. Zhu, and B. Liu, “Emotional chatting machine: Emotional conversation generation with internal and external memory,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018.
  33. Z. Yang, Z. Ren, Y. Wang, X. Zhu, Z. Chen, T. Cai, Y. Wu, Y. Su, S. Ju, and X. Liao, “Exploiting emotion-semantic correlations for empathetic response generation,” arXiv preprint arXiv:2402.17437, 2024.
  34. J. Gao, Y. Liu, H. Deng, W. Wang, Y. Cao, J. Du, and R. Xu, “Improving empathetic response generation by recognizing emotion cause in conversations,” in Findings of the association for computational linguistics: EMNLP 2021, 2021, pp. 807–819.
  35. S. Sabour, C. Zheng, and M. Huang, “Cem: Commonsense-aware empathetic response generation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 10, 2022, pp. 11 229–11 237.
  36. Y. Deng, W. Zhang, Y. Yuan, and W. Lam, “Knowledge-enhanced mixed-initiative dialogue system for emotional support conversations,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023.   Association for Computational Linguistics, 2023, pp. 4079–4095.
  37. A. Baumel, “Online emotional support delivered by trained volunteers: users’ satisfaction and their perception of the service compared to psychotherapy,” Journal of mental health, vol. 24, no. 5, pp. 313–320, 2015.
  38. H. Zhang, X. Li, and L. Bing, “Video-LLaMA: An instruction-tuned audio-visual language model for video understanding,” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations.   Singapore: Association for Computational Linguistics, Dec. 2023.
  39. S. Roller, E. Dinan, N. Goyal, D. Ju, M. Williamson, Y. Liu, J. Xu, M. Ott, E. M. Smith, Y.-L. Boureau, and J. Weston, “Recipes for building an open-domain chatbot,” in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Apr. 2021.
  40. D. Ghosal, N. Majumder, S. Poria, N. Chhaya, and A. F. Gelbukh, “Dialoguegcn: A graph convolutional neural network for emotion recognition in conversation,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019.   Association for Computational Linguistics, 2019, pp. 154–164.
  41. J. Hu, Y. Liu, J. Zhao, and Q. Jin, “MMGCN: multimodal fusion via deep graph convolution network for emotion recognition in conversation,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021.   Association for Computational Linguistics, 2021, pp. 5666–5675.
  42. D. Hu, X. Hou, L. Wei, L. Jiang, and Y. Mo, “MM-DFN: multimodal dynamic fusion network for emotion recognition in conversations,” in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022.   IEEE, 2022, pp. 7037–7041.
  43. Q. Zhang, J. Naradowsky, and Y. Miyao, “Ask an expert: Leveraging language models to improve strategic reasoning in goal-oriented dialogue models,” in Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023.   Association for Computational Linguistics, 2023, pp. 6665–6694.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yuqi Chu (1 paper)
  2. Lizi Liao (44 papers)
  3. Zhiyuan Zhou (26 papers)
  4. Chong-Wah Ngo (55 papers)
  5. Richang Hong (117 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com