Papers
Topics
Authors
Recent
Search
2000 character limit reached

InstructERC: Reforming Emotion Recognition in Conversation with Multi-task Retrieval-Augmented Large Language Models

Published 21 Sep 2023 in cs.CL | (2309.11911v6)

Abstract: The field of emotion recognition of conversation (ERC) has been focusing on separating sentence feature encoding and context modeling, lacking exploration in generative paradigms based on unified designs. In this study, we propose a novel approach, InstructERC, to reformulate the ERC task from a discriminative framework to a generative framework based on LLMs. InstructERC makes three significant contributions: (1) it introduces a simple yet effective retrieval template module, which helps the model explicitly integrate multi-granularity dialogue supervision information. (2) We introduce two additional emotion alignment tasks, namely speaker identification and emotion prediction tasks, to implicitly model the dialogue role relationships and future emotional tendencies in conversations. (3) Pioneeringly, we unify emotion labels across benchmarks through the feeling wheel to fit real application scenarios. InstructERC still perform impressively on this unified dataset. Our LLM-based plugin framework significantly outperforms all previous models and achieves comprehensive SOTA on three commonly used ERC datasets. Extensive analysis of parameter-efficient and data-scaling experiments provides empirical guidance for applying it in practical scenarios.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. I. S. MacKenzie, “Human-computer interaction: An empirical research perspective,” 2012.
  2. Y. Ma, K. L. Nguyen, F. Z. Xing, and E. Cambria, “A survey on empathetic dialogue systems,” Information Fusion, vol. 64, pp. 50–70, 2020.
  3. M. Pontiki, D. Galanis, H. Papageorgiou, I. Androutsopoulos, S. Manandhar, M. AL-Smadi, M. Al-Ayyoub, Y. Zhao, B. Qin, O. De Clercq et al., “Semeval-2016 task 5: Aspect based sentiment analysis,” in ProWorkshop on Semantic Evaluation (SemEval-2016).   Association for Computational Linguistics, 2016, pp. 19–30.
  4. L. Yingjian, L. Jiang, W. Xiaoping, and Z. Zhigang, “Emotionic: Emotional inertia and contagion-driven dependency modelling for emotion recognition in conversation,” arXiv preprint arXiv:2303.11117, 2023.
  5. W. Shen, S. Wu, Y. Yang, and X. Quan, “Directed acyclic graph network for conversational emotion recognition,” arXiv preprint arXiv:2105.12907, 2021.
  6. W. Shen, J. Chen, X. Quan, and Z. Xie, “Dialogxl: All-in-one xlnet for multi-party conversation emotion recognition,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 15, 2021, pp. 13 789–13 797.
  7. T. Kim and P. Vossen, “Emoberta: Speaker-aware emotion recognition in conversation with roberta,” arXiv preprint arXiv:2108.12009, 2021.
  8. G. Hu, T.-E. Lin, Y. Zhao, G. Lu, Y. Wu, and Y. Li, “UniMSE: Towards unified multimodal sentiment analysis and emotion recognition,” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.   Abu Dhabi, United Arab Emirates: Association for Computational Linguistics, Dec. 2022, pp. 7837–7851. [Online]. Available: https://aclanthology.org/2022.emnlp-main.534
  9. J. Wei, M. Bosma, V. Y. Zhao, K. Guu, A. W. Yu, B. Lester, N. Du, A. M. Dai, and Q. V. Le, “Finetuned language models are zero-shot learners,” arXiv preprint arXiv:2109.01652, 2021.
  10. H. W. Chung, L. Hou, S. Longpre, B. Zoph, Y. Tay, W. Fedus, Y. Li, X. Wang, M. Dehghani, S. Brahma et al., “Scaling instruction-finetuned language models,” arXiv preprint arXiv:2210.11416, 2022.
  11. Y. Shen, K. Song, X. Tan, D. Li, W. Lu, and Y. Zhuang, “Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface,” 2023.
  12. T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language models are few-shot learners,” 2020.
  13. H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample, “Llama: Open and efficient foundation language models,” 2023.
  14. OpenAI, “Gpt-4 technical report,” 2023.
  15. L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. Christiano, J. Leike, and R. Lowe, “Training language models to follow instructions with human feedback,” 2022.
  16. J. Lin, R. Men, A. Yang, C. Zhou, Y. Zhang, P. Wang, J. Zhou, J. Tang, and H. Yang, “M6: Multi-modality-to-multi-modality multitask mega-transformer for unified pretraining,” in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery Data Mining, ser. KDD ’21.   New York, NY, USA: Association for Computing Machinery, 2021, p. 3251–3261. [Online]. Available: https://doi.org/10.1145/3447548.3467206
  17. Y. Zhang, S. Wang, P. Li, G. Dong, S. Wang, Y. Xian, Z. Li, and H. Zhang, “Pay attention to implicit attribute values: A multi-modal generative framework for AVE task,” in Findings of the Association for Computational Linguistics: ACL 2023.   Toronto, Canada: Association for Computational Linguistics, Jul. 2023, pp. 13 139–13 151. [Online]. Available: https://aclanthology.org/2023.findings-acl.831
  18. J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” 2023.
  19. Z. Yuan, H. Yuan, C. Li, G. Dong, C. Tan, and C. Zhou, “Scaling relationship on learning mathematical reasoning with large language models,” 2023.
  20. E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” 2021.
  21. J. Li, M. Zhang, D. Ji, and Y. Liu, “Multi-task learning with auxiliary speaker identification for conversational emotion recognition,” arXiv preprint arXiv:2003.01478, 2020.
  22. X. Song, L. Huang, H. Xue, and S. Hu, “Supervised prototypical contrastive learning for emotion recognition in conversation,” arXiv preprint arXiv:2210.08713, 2022.
  23. X. Liu, J. Zhang, H. Zhang, F. Xue, and Y. You, “Hierarchical dialogue understanding with special tokens and turn-level attention,” arXiv preprint arXiv:2305.00262, 2023.
  24. V. Chudasama, P. Kar, A. Gudmalwar, N. Shah, P. Wasnik, and N. Onoe, “M2fnet: Multi-modal fusion network for emotion recognition in conversation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4652–4661.
  25. D. Ghosal, N. Majumder, S. Poria, N. Chhaya, and A. Gelbukh, “Dialoguegcn: A graph convolutional neural network for emotion recognition in conversation,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 154–164.
  26. T. Ishiwatari, Y. Yasuda, T. Miyazaki, and J. Goto, “Relation-aware graph attention networks with relational position encodings for emotion recognition in conversations,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020, pp. 7360–7370.
  27. J. Li, X. Wang, G. Lv, and Z. Zeng, “Graphcfc: A directed graph based cross-modal feature complementation approach for multimodal conversational emotion recognition,” IEEE Transactions on Multimedia, 2023.
  28. D. Hu, Y. Bao, L. Wei, W. Zhou, and S. Hu, “Supervised adversarial contrastive learning for emotion recognition in conversations,” arXiv preprint arXiv:2306.01505, 2023.
  29. S. Lei, X. Wang, G. Dong, J. Li, and Y. Liu, “Watch the speakers: A hybrid continuous attribution network for emotion recognition in conversation with emotion disentanglement,” arXiv preprint arXiv:2309.09799, 2023.
  30. N. Majumder, S. Poria, D. Hazarika, R. Mihalcea, A. Gelbukh, and E. Cambria, “Dialoguernn: An attentive rnn for emotion detection in conversations,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 6818–6825.
  31. D. Hazarika, S. Poria, R. Mihalcea, E. Cambria, and R. Zimmermann, “Icon: Interactive conversational memory network for multimodal emotion detection,” in Proceedings of the 2018 conference on empirical methods in natural language processing, 2018, pp. 2594–2604.
  32. S. Poria, E. Cambria, D. Hazarika, N. Majumder, A. Zadeh, and L.-P. Morency, “Context-dependent sentiment analysis in user-generated videos,” in Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: Long papers), 2017, pp. 873–883.
  33. B. Freudenthaler, J. Martinez-Gil, A. Fensel, K. Höfig, S. Huber, and D. Jacob, “Ki-net: Ai-based optimization in industrial manufacturing—a project overview,” in International Conference on Computer Aided Systems Theory.   Springer, 2022, pp. 554–561.
  34. D. Ghosal, N. Majumder, A. Gelbukh, R. Mihalcea, and S. Poria, “Cosmic: Commonsense knowledge for emotion identification in conversations,” arXiv preprint arXiv:2010.02795, 2020.
  35. P. Zhong, D. Wang, and C. Miao, “Knowledge-enriched transformer for emotion detection in textual conversations,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 165–176.
  36. L. Zhu, G. Pergola, and L. Gui, “Topic-driven and knowledge-aware transformer for dialogue emotion detection,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021, pp. 1571–1582.
  37. J. Li, Z. Lin, P. Fu, and W. Wang, “Past, present, and future: Conversational emotion recognition through structural modeling of psychological knowledge,” in Findings of the association for computational linguistics: EMNLP 2021, 2021, pp. 1204–1214.
  38. R. Anil, A. M. Dai, O. Firat, M. Johnson, D. Lepikhin, A. Passos, S. Shakeri, E. Taropa, P. Bailey, Z. Chen et al., “Palm 2 technical report,” arXiv preprint arXiv:2305.10403, 2023.
  39. J. Hoffmann, S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, D. d. L. Casas, L. A. Hendricks, J. Welbl, A. Clark et al., “Training compute-optimal large language models,” arXiv preprint arXiv:2203.15556, 2022.
  40. A. Chronopoulou, C. Baziotis, and A. Potamianos, “An embarrassingly simple approach for transfer learning from pretrained language models,” arXiv preprint arXiv:1902.10547, 2019.
  41. N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” arXiv preprint arXiv:1908.10084, 2019.
  42. C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J. N. Chang, S. Lee, and S. S. Narayanan, “Iemocap: Interactive emotional dyadic motion capture database,” Language resources and evaluation, vol. 42, pp. 335–359, 2008.
  43. S. Poria, D. Hazarika, N. Majumder, G. Naik, E. Cambria, and R. Mihalcea, “Meld: A multimodal multi-party dataset for emotion recognition in conversations,” arXiv preprint arXiv:1810.02508, 2018.
  44. S. M. Zahiri and J. D. Choi, “Emotion detection on tv show transcripts with sequence-based convolutional neural networks,” arXiv preprint arXiv:1708.04299, 2017.
  45. S. Li, H. Yan, and X. Qiu, “Contrast and generation make bart a good dialogue emotion recognizer,” 2021.
  46. W. Zhao, Y. Zhao, and X. Lu, “Cauain: Causal aware interaction network for emotion recognition in conversations,” in Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI, 2022, pp. 4524–4530.
  47. H. Zhang and Y. Chai, “Coin: Conversational interactive networks for emotion recognition in conversation,” in Proceedings of the Third Workshop on Multimodal Artificial Intelligence, 2021, pp. 12–18.
  48. D. Hu, L. Wei, and X. Huai, “Dialoguecrn: Contextual reasoning networks for emotion recognition in conversations,” in ACL/IJCNLP (1).   Association for Computational Linguistics, 2021, pp. 7042–7052.
  49. Y. Lu, Q. Liu, D. Dai, X. Xiao, H. Lin, X. Han, L. Sun, and H. Wu, “Unified structure generation for universal information extraction,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).   Dublin, Ireland: Association for Computational Linguistics, May 2022, pp. 5755–5772. [Online]. Available: https://aclanthology.org/2022.acl-long.395
  50. Z. Du, Y. Qian, X. Liu, M. Ding, J. Qiu, Z. Yang, and J. Tang, “Glm: General language model pretraining with autoregressive blank infilling,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022, pp. 320–335.
  51. G. Willcox, “The feeling wheel: A tool for expanding awareness of emotions and increasing spontaneity and intimacy,” Transactional Analysis Journal, vol. 12, no. 4, pp. 274–276, 1982.
  52. G. Dong, H. Yuan, K. Lu, C. Li, M. Xue, D. Liu, W. Wang, Z. Yuan, C. Zhou, and J. Zhou, “How abilities in large language models are affected by supervised fine-tuning data composition,” arXiv preprint arXiv:2310.05492, 2023.

Summary

  • The paper presents a novel generative framework using retrieval templates to integrate multi-granularity dialogue features for emotion recognition.
  • It incorporates auxiliary tasks like speaker identification and emotion prediction to model interpersonal dynamics and forecast emotional trajectories.
  • Experimental results on IEMOCAP, MELD, and EmoryNLP datasets show superior performance and efficient parameter scaling over traditional methods.

InstructERC: Reformulating Emotion Recognition in Conversations with LLMs

The field of emotion recognition in conversation (ERC) is pivotal for advancing human-computer interaction by enabling machines to understand nuanced emotional expressions in dialogue. Traditional approaches in ERC have mainly relied on separating sentence feature encoding from context modeling, often resulting in a lack of integrated designs capable of addressing both elements harmoniously. The study presented herein introduces InstructERC, a framework aimed at transforming the ERC task from a traditional discriminative methodology into a generative approach by leveraging the capabilities of LLMs.

Key Contributions and Methodological Advances

  1. Generative Framework with Retrieval Templates: InstructERC introduces a novel retrieval template module designed to integrate multi-granularity dialogue supervision, which enriches the ERC task by unifying the output and input through a sequence-to-sequence (Seq2Seq) paradigm, a significant departure from previous discriminative frameworks.
  2. Emotion Alignment Tasks: The researchers introduce two auxiliary tasks—speaker identification and emotion prediction tasks—which serve to enrich the modeling of speaker roles and project future emotional states in dialogues. These tasks aim to provide implicit cues about interpersonal dynamics and emotional trajectories within conversations.
  3. Unified Emotion Labeling: InstructERC proposes a unified approach to emotion labeling across different benchmarks using the concept of the "feeling wheel." This alignment aims to increase the applicability and consistency of the model across various datasets.

Experimental Results

The paper reports that InstructERC significantly outperforms previous state-of-the-art models on three well-known ERC datasets: IEMOCAP, MELD, and EmoryNLP. This achievement is attributed to the model's ability to leverage LLMs’ generative frameworks, which provide a more robust foundation for language understanding and emotional nuance. Experimental evaluations also highlight the model's efficiency in parameter utilization and data scaling, suggesting it as a viable option for practical implementations.

Implications and Future Directions

The practical implications of InstructERC are substantial, particularly in the development of AI systems capable of empathetic interactions, which are increasingly important in domains such as virtual assistants, mental health diagnostics, and customer service. Theoretically, the shift from discriminative to generative frameworks in ERC tasks represents a significant step forward in how emotional intelligence is conceptualized within AI systems.

Future work could explore the integration of multi-modal data sources, such as visual and auditory signals, to further enhance the model's contextual awareness and emotional accuracy. Additionally, expanding InstructERC to support real-time and cross-cultural emotional interpretations will likely improve its applicability in global settings.

In conclusion, InstructERC not only reforms the methodologies typically applied in ERC tasks but also sets a new standard for how emotional recognition systems can be developed using advanced LLMs. The research provides a strong foundation for future advancements in the emotional intelligence of AI systems.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.