JMI at SemEval 2024 Task 3: Two-step approach for multimodal ECAC using in-context learning with GPT and instruction-tuned Llama models (2403.04798v2)
Abstract: This paper presents our system development for SemEval-2024 Task 3: "The Competition of Multimodal Emotion Cause Analysis in Conversations". Effectively capturing emotions in human conversations requires integrating multiple modalities such as text, audio, and video. However, the complexities of these diverse modalities pose challenges for developing an efficient multimodal emotion cause analysis (ECA) system. Our proposed approach addresses these challenges by a two-step framework. We adopt two different approaches in our implementation. In Approach 1, we employ instruction-tuning with two separate Llama 2 models for emotion and cause prediction. In Approach 2, we use GPT-4V for conversation-level video description and employ in-context learning with annotated conversation using GPT 3.5. Our system wins rank 4, and system ablation experiments demonstrate that our proposed solutions achieve significant performance gains. All the experimental codes are available on Github.
- Multimodal emotion recognition using deep learning. Journal of Applied Science and Technology Trends, 2(02):52–58.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
- Palm 2 technical report. arXiv preprint arXiv:2305.10403.
- The omg-emotion behavior dataset. In 2018 International Joint Conference on Neural Networks (IJCNN), pages 1–7. IEEE.
- Iemocap: Interactive emotional dyadic motion capture database. Language resources and evaluation, 42:335–359.
- End-to-end emotion-cause pair extraction with graph convolutional network. In Proceedings of the 28th International Conference on Computational Linguistics, pages 198–207.
- Emotion cause detection with linguistic constructions. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pages 179–187.
- Nnime: The nthu-ntua chinese interactive multimodal emotion corpus. In 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), pages 292–298. IEEE.
- Qlora: Efficient finetuning of quantized llms. Advances in Neural Information Processing Systems, 36.
- A survey for in-context learning. ArXiv, abs/2301.00234.
- The faiss library. arXiv preprint arXiv:2401.08281.
- Transition-based directed graph construction for emotion-cause pair extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 3707–3717.
- Meisd: A multimodal multi-label emotion, intensity and sentiment dialogue dataset for emotion recognition and sentiment analysis in conversations. In Proceedings of the 28th international conference on computational linguistics, pages 4441–4453.
- Emotion recognition in conversations: A survey focusing on context, speaker dependencies, and fusion methods. Electronics, 12(22):4714.
- Detecting emotion stimuli in emotion-bearing sentences. In Computational Linguistics and Intelligent Text Processing: 16th International Conference, CICLing 2015, Cairo, Egypt, April 14-20, 2015, Proceedings, Part II 16, pages 152–165. Springer.
- Event-driven emotion cause extraction with corpus construction. In Social Media Content Analysis: Natural Language Processing and Beyond, pages 145–160. World Scientific.
- Emotionlines: An emotion corpus of multi-party conversations. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).
- Fss-gcn: A graph convolutional networks with fusion of semantic and structure for emotion cause analysis. Knowledge-Based Systems, 212:106584.
- Uncovering the causes of emotions in software developer communication using zero-shot llms. arXiv preprint arXiv:2312.09731.
- Instructerc: Reforming emotion recognition in conversation with a retrieval multi-task llms framework.
- Weiyuan Li and Hua Xu. 2014. Text-based emotion classification using emotion cause extraction. Expert Systems with Applications, 41(4):1742–1749.
- Boundary detection with bert for span-level emotion cause analysis. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 676–682.
- Decoupled multimodal distilling for emotion recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6631–6640.
- Mm-vid: Advancing video understanding with gpt-4v(ision). ArXiv, abs/2310.19773.
- Affect2mm: Affective analysis of multimedia content using emotion causality. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5661–5671.
- Text and code embeddings by contrastive pre-training. arXiv preprint arXiv:2201.10005.
- A survey on deep learning for textual emotion analysis in social networks. Digital Communications and Networks, 8(5):745–762.
- Deep emotion recognition in textual conversations: A survey. arXiv preprint arXiv:2211.09172.
- Meld: A multimodal multi-party dataset for emotion recognition in conversations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 527–536.
- Is chatgpt a general-purpose natural language processing task solver? arXiv preprint arXiv:2302.06476.
- Multimodal approaches for emotion recognition: a survey. In Internet Imaging VI, volume 5670, pages 56–67. SPIE.
- Multimodal big data affective analytics: A comprehensive survey using text, audio, visual and physiological signals. Journal of Network and Computer Applications, 149:102447.
- An end-to-end network for emotion-cause pair extraction. In Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pages 84–91.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Multimodal emotion-cause pair extraction in conversations.
- Multimodal emotion-cause pair extraction in conversations. IEEE Trans. Affect. Comput., 14(3):1832–1844.
- Visionllm: Large language model is also an open-ended decoder for vision-centric tasks. Advances in Neural Information Processing Systems, 36.
- Augmenting black-box llms with medical textbooks for clinical question answering.
- Knowledge-enhanced hierarchical transformers for emotion-cause pair extraction. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 112–123. Springer.
- Is chatgpt a good sentiment analyzer? a preliminary study. arXiv preprint arXiv:2304.04339.
- Emergent abilities of large language models. arXiv preprint arXiv:2206.07682.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
- Effective inter-clause modeling for end-to-end emotion-cause pair extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 3171–3181.
- Youtube movie reviews: Sentiment analysis in an audio-visual context. IEEE Intelligent Systems, 28(3):46–53.
- Enhancing large language model with decomposed reasoning for emotion cause pair extraction. arXiv preprint arXiv:2401.17716.
- Rui Xia and Zixiang Ding. 2019. Emotion-cause pair extraction: A new task to emotion analysis in texts. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1003–1012.
- A bootstrap method for automatic rule acquisition on emotion cause extraction. In 2017 IEEE International Conference on Data Mining Workshops (ICDMW), pages 414–421. IEEE.
- Human-centric autonomous systems with llms for user command reasoning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops, pages 988–994.
- The dawn of lmms: Preliminary explorations with gpt-4v(ision). ArXiv, abs/2309.17421.
- Ch-sims: A chinese multimodal sentiment analysis dataset with fine-grained annotation of modality. In Proceedings of the 58th annual meeting of the association for computational linguistics, pages 3718–3727.
- Mosi: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos. arXiv preprint arXiv:1606.06259.
- Deep learning-based multimodal emotion recognition from audio, visual, and text modalities: A systematic review of recent advancements and future prospects. Expert Systems with Applications, page 121692.
- Xiaoheng Zhang and Yang Li. 2023. A cross-modality context fusion and semantic refinement network for emotion recognition in conversation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13099–13110.
- Dialoguellm: Context and emotion knowledge-tuned large language models for emotion recognition in conversations.
- Knowledge-bridged causal interaction network for causal emotion entailment. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 14020–14028.
- Ecqed: Emotion-cause quadruple extraction in dialogs. arXiv preprint arXiv:2306.03969.
- A facial expression-aware multimodal multi-task learning framework for emotion recognition in multi-party conversations. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15445–15459.
- Ueca-prompt: Universal prompt for emotion cause analysis. In Proceedings of the 29th International Conference on Computational Linguistics, pages 7031–7041.
- Knowledge-enriched transformer for emotion detection in textual conversations. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 165–176.
- Arefa (2 papers)
- Mohammed Abbas Ansari (1 paper)
- Chandni Saxena (18 papers)
- Tanvir Ahmad (17 papers)