Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Samsung Research China-Beijing at SemEval-2024 Task 3: A multi-stage framework for Emotion-Cause Pair Extraction in Conversations (2404.16905v1)

Published 25 Apr 2024 in cs.CL, cs.SD, and eess.AS

Abstract: In human-computer interaction, it is crucial for agents to respond to human by understanding their emotions. Unraveling the causes of emotions is more challenging. A new task named Multimodal Emotion-Cause Pair Extraction in Conversations is responsible for recognizing emotion and identifying causal expressions. In this study, we propose a multi-stage framework to generate emotion and extract the emotion causal pairs given the target emotion. In the first stage, Llama-2-based InstructERC is utilized to extract the emotion category of each utterance in a conversation. After emotion recognition, a two-stream attention model is employed to extract the emotion causal pairs given the target emotion for subtask 2 while MuTEC is employed to extract causal span for subtask 1. Our approach achieved first place for both of the two subtasks in the competition.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Ashwani Bhat and Ashutosh Modi. 2023. Multi-task learning framework for extracting emotion cause span and entailment in conversations. In Transfer Learning for Natural Language Processing Workshop, pages 33–51.
  2. IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation, 42:335–359.
  3. Multimodal emotion recognition from expressive faces, body gestures and speech. In Artificial Intelligence and Innovations 2007: from Theory to Applications, pages 375–388, Boston, MA. Springer US.
  4. Mobilefacenets: Efficient cnns for accurate real-time face verification on mobile devices. In Biometric Recognition: 13th Chinese Conference, CCBR 2018, Urumqi, China, August 11-12, 2018, Proceedings 13, pages 428–438. Springer.
  5. M2fnet: Multi-modal fusion network for emotion recognition in conversation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 4652–4661.
  6. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  7. ECPE-2D: Emotion-cause pair extraction based on joint two-dimensional representation, interaction and prediction. In In Association for Computational Linguistics (ACL), page 3161–3170.
  8. Timothy Dozat and Christopher D Manning. 2016. Deep biaffine attention for neural dependency parsing. arXiv preprint arXiv:1611.01734.
  9. The geneva minimalistic acoustic parameter set (gemaps) for voice research and affective computing. IEEE Transactions on Affective Computing, 7(2):190–202.
  10. Opensmile: the munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM International Conference on Multimedia, MM ’10, page 1459–1462, New York, NY, USA. Association for Computing Machinery.
  11. Transition-based directed graph construction for emotion-cause pair extraction. In In Association for Computational Linguistics (ACL), page 3707–3717.
  12. KI-Net: Ai-based optimization in industrial manufacturing—a project overview. In International Conference on Computer Aided Systems Theory, pages 554–561. Springer.
  13. Cosmic: Commonsense knowledge for emotion identification in conversations. arXiv preprint arXiv:2010.02795.
  14. DialogueGCN: A graph convolutional neural network for emotion recognition in conversation. arXiv preprint arXiv:1908.11540.
  15. ICON: Interactive conversational memory network for multimodal emotion detection. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2594–2604.
  16. Supervised adversarial contrastive learning for emotion recognition in conversations. arXiv preprint arXiv:2306.01505.
  17. UniMSE: Towards unified multimodal sentiment analysis and emotion recognition. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 7837–7851.
  18. Spanbert: Improving pre-training by representing and predicting spans. Transactions of the association for computational linguistics, 8:64–77.
  19. Taewoon Kim and Piek Vossen. 2021. EmoBERTa: Speaker-aware emotion recognition in conversation with roberta.
  20. Instructerc: Reforming emotion recognition in conversation with a retrieval multi-task llms framework. arXiv preprint arXiv:2309.11911.
  21. InstructERC: Reforming emotion recognition in conversation with a retrieval multi-task llms framework. arXiv preprint, arXiv:2309.11911.
  22. Watch the speakers: A hybrid continuous attribution network for emotion recognition in conversation with emotion disentanglement. In 2023 IEEE 35th International Conference on Tools with Artificial Intelligence (ICTAI), pages 881–888.
  23. GraphCFC: A directed graph based cross-modal feature complementation approach for multimodal conversational emotion recognition. IEEE Transactions on Multimedia, 26:77–89.
  24. Neutral utterances are also causes: Enhancing conversational causal emotion entailment with social commonsense knowledge. arXiv preprint arXiv:2205.00759.
  25. Multi-task learning with auxiliary speaker identification for conversational emotion recognition. arXiv preprint arXiv:2003.01478.
  26. Dice loss for data-imbalanced nlp tasks. arXiv preprint arXiv:1911.02855.
  27. DailyDialog a manually labelled multi-turn dialogue dataset. arXiv preprint, arXiv:1710.03957.
  28. Visual instruction tuning.
  29. Hierarchical dialogue understanding with special tokens and turn-level attention. arXiv preprint arXiv:2305.00262.
  30. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
  31. DialogueRNN: An attentive rnn for emotion detection in conversations. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01):6818–6825.
  32. Context-dependent sentiment analysis in user-generated videos. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 873–883.
  33. MELD: A multimodal multi-party dataset for emotion recognition in conversations. arXiv preprint, arXiv:1810.02508.
  34. Recognizing emotion cause in conversations. Cognitive Computation, 13:1317–1332.
  35. Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.
  36. Bjorn Schuller and Anton Batliner. 2013. Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing, 1st edition. Wiley Publishing.
  37. DialogXL: All-in-one xlnet for multi-party conversation emotion recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 35(15):13789–13797.
  38. Directed acyclic graph network for conversational emotion recognition. arXiv preprint arXiv:2105.12907.
  39. Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations.
  40. Supervised prototypical contrastive learning for emotion recognition in conversation. arXiv preprint arXiv:2210.08713.
  41. Relation-aware graph attention networks with relational position encodings for emotion recognition in conversations. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7360–7370. Association for Computational Linguistics.
  42. Multimodal emotion-cause pair extraction in conversations. IEEE Transactions on Affective Computing, 14(3):1832–1844.
  43. Semeval-2024 task 3: Multimodal emotion cause analysis in conversations. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 2022–2033, Mexico City, Mexico. Association for Computational Linguistics.
  44. Generative emotion cause triplet extraction in conversations with commonsense knowledge. In In Findings of the Association for Computational Linguistics: EMNLP 2023, page 3952–3963.
  45. Contextualized emotion recognition in conversation as sequence tagging. In Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 186–195.
  46. Chain of thought prompting elicits reasoning in large language models. ArXiv, abs/2201.11903.
  47. Effective inter-clause modeling for end-to-end emotion-cause pair extraction. In In Association for Computational Linguistics (ACL), page 3171–3181.
  48. Rui Xia and Zixiang Ding. 2019. Emotion-cause pair extraction: A new task to emotion analysis in texts. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1003–1012.
  49. Sayyed M. Zahiri and Jinho D. Choi. 2017. Emotion detection on tv show transcripts with sequence-based convolutional neural networks. arXiv preprint, arXiv:1708.04299.
  50. Modeling both context-and speaker-sensitive dependence for emotion detection in multi-speaker conversations. In IJCAI, pages 5415–5421.
  51. Tsam: A two-stream attention model for causal emotion entailment. arXiv preprint arXiv:2203.00819.
  52. Video-LLaMA: An instruction-tuned audio-visual language model for video understanding. arXiv preprint arXiv:2306.02858.
  53. Samsung research china-beijing at semeval-2023 task 2: An al-r model for multilingual complex named entity recognition. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 114–120.
  54. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE signal processing letters, 23(10):1499–1503.
  55. Knowledge-bridged causal interaction network for causal emotion entailment. Proceedings of the AAAI Conference on Artificial Intelligence, 37(11):14020–14028.
  56. Knowledge-enriched transformer for emotion detection in textual conversations. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 165–176.
  57. Topic-driven and knowledge-aware transformer for dialogue emotion detection. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1571–1582.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Shen Zhang (48 papers)
  2. Haojie Zhang (21 papers)
  3. Jing Zhang (730 papers)
  4. Xudong Zhang (42 papers)
  5. Yimeng Zhuang (2 papers)
  6. Jinting Wu (4 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com