InstructERC: Reforming Emotion Recognition in Conversation with Multi-task Retrieval-Augmented Large Language Models
Abstract: The field of emotion recognition of conversation (ERC) has been focusing on separating sentence feature encoding and context modeling, lacking exploration in generative paradigms based on unified designs. In this study, we propose a novel approach, InstructERC, to reformulate the ERC task from a discriminative framework to a generative framework based on LLMs. InstructERC makes three significant contributions: (1) it introduces a simple yet effective retrieval template module, which helps the model explicitly integrate multi-granularity dialogue supervision information. (2) We introduce two additional emotion alignment tasks, namely speaker identification and emotion prediction tasks, to implicitly model the dialogue role relationships and future emotional tendencies in conversations. (3) Pioneeringly, we unify emotion labels across benchmarks through the feeling wheel to fit real application scenarios. InstructERC still perform impressively on this unified dataset. Our LLM-based plugin framework significantly outperforms all previous models and achieves comprehensive SOTA on three commonly used ERC datasets. Extensive analysis of parameter-efficient and data-scaling experiments provides empirical guidance for applying it in practical scenarios.
- I. S. MacKenzie, “Human-computer interaction: An empirical research perspective,” 2012.
- Y. Ma, K. L. Nguyen, F. Z. Xing, and E. Cambria, “A survey on empathetic dialogue systems,” Information Fusion, vol. 64, pp. 50–70, 2020.
- M. Pontiki, D. Galanis, H. Papageorgiou, I. Androutsopoulos, S. Manandhar, M. AL-Smadi, M. Al-Ayyoub, Y. Zhao, B. Qin, O. De Clercq et al., “Semeval-2016 task 5: Aspect based sentiment analysis,” in ProWorkshop on Semantic Evaluation (SemEval-2016). Association for Computational Linguistics, 2016, pp. 19–30.
- L. Yingjian, L. Jiang, W. Xiaoping, and Z. Zhigang, “Emotionic: Emotional inertia and contagion-driven dependency modelling for emotion recognition in conversation,” arXiv preprint arXiv:2303.11117, 2023.
- W. Shen, S. Wu, Y. Yang, and X. Quan, “Directed acyclic graph network for conversational emotion recognition,” arXiv preprint arXiv:2105.12907, 2021.
- W. Shen, J. Chen, X. Quan, and Z. Xie, “Dialogxl: All-in-one xlnet for multi-party conversation emotion recognition,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 15, 2021, pp. 13 789–13 797.
- T. Kim and P. Vossen, “Emoberta: Speaker-aware emotion recognition in conversation with roberta,” arXiv preprint arXiv:2108.12009, 2021.
- G. Hu, T.-E. Lin, Y. Zhao, G. Lu, Y. Wu, and Y. Li, “UniMSE: Towards unified multimodal sentiment analysis and emotion recognition,” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Abu Dhabi, United Arab Emirates: Association for Computational Linguistics, Dec. 2022, pp. 7837–7851. [Online]. Available: https://aclanthology.org/2022.emnlp-main.534
- J. Wei, M. Bosma, V. Y. Zhao, K. Guu, A. W. Yu, B. Lester, N. Du, A. M. Dai, and Q. V. Le, “Finetuned language models are zero-shot learners,” arXiv preprint arXiv:2109.01652, 2021.
- H. W. Chung, L. Hou, S. Longpre, B. Zoph, Y. Tay, W. Fedus, Y. Li, X. Wang, M. Dehghani, S. Brahma et al., “Scaling instruction-finetuned language models,” arXiv preprint arXiv:2210.11416, 2022.
- Y. Shen, K. Song, X. Tan, D. Li, W. Lu, and Y. Zhuang, “Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface,” 2023.
- T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language models are few-shot learners,” 2020.
- H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample, “Llama: Open and efficient foundation language models,” 2023.
- OpenAI, “Gpt-4 technical report,” 2023.
- L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. Christiano, J. Leike, and R. Lowe, “Training language models to follow instructions with human feedback,” 2022.
- J. Lin, R. Men, A. Yang, C. Zhou, Y. Zhang, P. Wang, J. Zhou, J. Tang, and H. Yang, “M6: Multi-modality-to-multi-modality multitask mega-transformer for unified pretraining,” in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery Data Mining, ser. KDD ’21. New York, NY, USA: Association for Computing Machinery, 2021, p. 3251–3261. [Online]. Available: https://doi.org/10.1145/3447548.3467206
- Y. Zhang, S. Wang, P. Li, G. Dong, S. Wang, Y. Xian, Z. Li, and H. Zhang, “Pay attention to implicit attribute values: A multi-modal generative framework for AVE task,” in Findings of the Association for Computational Linguistics: ACL 2023. Toronto, Canada: Association for Computational Linguistics, Jul. 2023, pp. 13 139–13 151. [Online]. Available: https://aclanthology.org/2023.findings-acl.831
- J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” 2023.
- Z. Yuan, H. Yuan, C. Li, G. Dong, C. Tan, and C. Zhou, “Scaling relationship on learning mathematical reasoning with large language models,” 2023.
- E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” 2021.
- J. Li, M. Zhang, D. Ji, and Y. Liu, “Multi-task learning with auxiliary speaker identification for conversational emotion recognition,” arXiv preprint arXiv:2003.01478, 2020.
- X. Song, L. Huang, H. Xue, and S. Hu, “Supervised prototypical contrastive learning for emotion recognition in conversation,” arXiv preprint arXiv:2210.08713, 2022.
- X. Liu, J. Zhang, H. Zhang, F. Xue, and Y. You, “Hierarchical dialogue understanding with special tokens and turn-level attention,” arXiv preprint arXiv:2305.00262, 2023.
- V. Chudasama, P. Kar, A. Gudmalwar, N. Shah, P. Wasnik, and N. Onoe, “M2fnet: Multi-modal fusion network for emotion recognition in conversation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4652–4661.
- D. Ghosal, N. Majumder, S. Poria, N. Chhaya, and A. Gelbukh, “Dialoguegcn: A graph convolutional neural network for emotion recognition in conversation,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 154–164.
- T. Ishiwatari, Y. Yasuda, T. Miyazaki, and J. Goto, “Relation-aware graph attention networks with relational position encodings for emotion recognition in conversations,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020, pp. 7360–7370.
- J. Li, X. Wang, G. Lv, and Z. Zeng, “Graphcfc: A directed graph based cross-modal feature complementation approach for multimodal conversational emotion recognition,” IEEE Transactions on Multimedia, 2023.
- D. Hu, Y. Bao, L. Wei, W. Zhou, and S. Hu, “Supervised adversarial contrastive learning for emotion recognition in conversations,” arXiv preprint arXiv:2306.01505, 2023.
- S. Lei, X. Wang, G. Dong, J. Li, and Y. Liu, “Watch the speakers: A hybrid continuous attribution network for emotion recognition in conversation with emotion disentanglement,” arXiv preprint arXiv:2309.09799, 2023.
- N. Majumder, S. Poria, D. Hazarika, R. Mihalcea, A. Gelbukh, and E. Cambria, “Dialoguernn: An attentive rnn for emotion detection in conversations,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 6818–6825.
- D. Hazarika, S. Poria, R. Mihalcea, E. Cambria, and R. Zimmermann, “Icon: Interactive conversational memory network for multimodal emotion detection,” in Proceedings of the 2018 conference on empirical methods in natural language processing, 2018, pp. 2594–2604.
- S. Poria, E. Cambria, D. Hazarika, N. Majumder, A. Zadeh, and L.-P. Morency, “Context-dependent sentiment analysis in user-generated videos,” in Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: Long papers), 2017, pp. 873–883.
- B. Freudenthaler, J. Martinez-Gil, A. Fensel, K. Höfig, S. Huber, and D. Jacob, “Ki-net: Ai-based optimization in industrial manufacturing—a project overview,” in International Conference on Computer Aided Systems Theory. Springer, 2022, pp. 554–561.
- D. Ghosal, N. Majumder, A. Gelbukh, R. Mihalcea, and S. Poria, “Cosmic: Commonsense knowledge for emotion identification in conversations,” arXiv preprint arXiv:2010.02795, 2020.
- P. Zhong, D. Wang, and C. Miao, “Knowledge-enriched transformer for emotion detection in textual conversations,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 165–176.
- L. Zhu, G. Pergola, and L. Gui, “Topic-driven and knowledge-aware transformer for dialogue emotion detection,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021, pp. 1571–1582.
- J. Li, Z. Lin, P. Fu, and W. Wang, “Past, present, and future: Conversational emotion recognition through structural modeling of psychological knowledge,” in Findings of the association for computational linguistics: EMNLP 2021, 2021, pp. 1204–1214.
- R. Anil, A. M. Dai, O. Firat, M. Johnson, D. Lepikhin, A. Passos, S. Shakeri, E. Taropa, P. Bailey, Z. Chen et al., “Palm 2 technical report,” arXiv preprint arXiv:2305.10403, 2023.
- J. Hoffmann, S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, D. d. L. Casas, L. A. Hendricks, J. Welbl, A. Clark et al., “Training compute-optimal large language models,” arXiv preprint arXiv:2203.15556, 2022.
- A. Chronopoulou, C. Baziotis, and A. Potamianos, “An embarrassingly simple approach for transfer learning from pretrained language models,” arXiv preprint arXiv:1902.10547, 2019.
- N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” arXiv preprint arXiv:1908.10084, 2019.
- C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J. N. Chang, S. Lee, and S. S. Narayanan, “Iemocap: Interactive emotional dyadic motion capture database,” Language resources and evaluation, vol. 42, pp. 335–359, 2008.
- S. Poria, D. Hazarika, N. Majumder, G. Naik, E. Cambria, and R. Mihalcea, “Meld: A multimodal multi-party dataset for emotion recognition in conversations,” arXiv preprint arXiv:1810.02508, 2018.
- S. M. Zahiri and J. D. Choi, “Emotion detection on tv show transcripts with sequence-based convolutional neural networks,” arXiv preprint arXiv:1708.04299, 2017.
- S. Li, H. Yan, and X. Qiu, “Contrast and generation make bart a good dialogue emotion recognizer,” 2021.
- W. Zhao, Y. Zhao, and X. Lu, “Cauain: Causal aware interaction network for emotion recognition in conversations,” in Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI, 2022, pp. 4524–4530.
- H. Zhang and Y. Chai, “Coin: Conversational interactive networks for emotion recognition in conversation,” in Proceedings of the Third Workshop on Multimodal Artificial Intelligence, 2021, pp. 12–18.
- D. Hu, L. Wei, and X. Huai, “Dialoguecrn: Contextual reasoning networks for emotion recognition in conversations,” in ACL/IJCNLP (1). Association for Computational Linguistics, 2021, pp. 7042–7052.
- Y. Lu, Q. Liu, D. Dai, X. Xiao, H. Lin, X. Han, L. Sun, and H. Wu, “Unified structure generation for universal information extraction,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Dublin, Ireland: Association for Computational Linguistics, May 2022, pp. 5755–5772. [Online]. Available: https://aclanthology.org/2022.acl-long.395
- Z. Du, Y. Qian, X. Liu, M. Ding, J. Qiu, Z. Yang, and J. Tang, “Glm: General language model pretraining with autoregressive blank infilling,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022, pp. 320–335.
- G. Willcox, “The feeling wheel: A tool for expanding awareness of emotions and increasing spontaneity and intimacy,” Transactional Analysis Journal, vol. 12, no. 4, pp. 274–276, 1982.
- G. Dong, H. Yuan, K. Lu, C. Li, M. Xue, D. Liu, W. Wang, Z. Yuan, C. Zhou, and J. Zhou, “How abilities in large language models are affected by supervised fine-tuning data composition,” arXiv preprint arXiv:2310.05492, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.