Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Curriculum Learning Meets Directed Acyclic Graph for Multimodal Emotion Recognition (2402.17269v2)

Published 27 Feb 2024 in cs.LG

Abstract: Emotion recognition in conversation (ERC) is a crucial task in natural language processing and affective computing. This paper proposes MultiDAG+CL, a novel approach for Multimodal Emotion Recognition in Conversation (ERC) that employs Directed Acyclic Graph (DAG) to integrate textual, acoustic, and visual features within a unified framework. The model is enhanced by Curriculum Learning (CL) to address challenges related to emotional shifts and data imbalance. Curriculum learning facilitates the learning process by gradually presenting training samples in a meaningful order, thereby improving the model's performance in handling emotional variations and data imbalance. Experimental results on the IEMOCAP and MELD datasets demonstrate that the MultiDAG+CL models outperform baseline models. We release the code for MultiDAG+CL and experiments: https://github.com/vanntc711/MultiDAG-CL

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pages 41–48.
  2. Iemocap: Interactive emotional dyadic motion capture database. Language resources and evaluation, 42(4):335–359.
  3. DialogueGCN: A graph convolutional neural network for emotion recognition in conversation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 154–164, Hong Kong, China. Association for Computational Linguistics.
  4. Affect-LM: A neural language model for customizable affective text generation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 634–642, Vancouver, Canada. Association for Computational Linguistics.
  5. ICON: Interactive conversational memory network for multimodal emotion detection. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2594–2604, Brussels, Belgium. Association for Computational Linguistics.
  6. MMGCN: Multimodal fusion via deep graph convolution network for emotion recognition in conversation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5666–5675, Online. Association for Computational Linguistics.
  7. COGMEN: COntextualized GNN based multimodal emotion recognitioN. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4148–4164, Seattle, United States. Association for Computational Linguistics.
  8. Towards discriminative representation learning for speech emotion recognition. In IJCAI, pages 5060–5066.
  9. Ctnet: Conversational transformer network for emotion recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:985–1000.
  10. Dialoguernn: An attentive rnn for emotion detection in conversations. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 6818–6825.
  11. Conversation understanding using relational temporal graph neural networks with auxiliary cross-modality interaction. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 15154–15167, Singapore. Association for Computational Linguistics.
  12. Thomas S Polzin and Alexander Waibel. 2000. Emotion-sensitive human-computer interfaces. In ISCA tutorial and research workshop (ITRW) on speech and emotion.
  13. Context-dependent sentiment analysis in user-generated videos. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 873–883, Vancouver, Canada. Association for Computational Linguistics.
  14. MELD: A multimodal multi-party dataset for emotion recognition in conversations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 527–536, Florence, Italy. Association for Computational Linguistics.
  15. Directed acyclic graph network for conversational emotion recognition. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1551–1560, Online. Association for Computational Linguistics.
  16. Curriculum learning: A survey. International Journal of Computer Vision, 130(6):1526–1565.
  17. Hybrid curriculum learning for emotion recognition in conversation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 11595–11603.
  18. DAG-GNN: DAG structure learning with graph neural networks. In Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 7154–7163. PMLR.
  19. Memory fusion network for multi-view sequential learning. In Proceedings of the AAAI conference on artificial intelligence, volume 32.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Cam-Van Thi Nguyen (10 papers)
  2. Cao-Bach Nguyen (1 paper)
  3. Quang-Thuy Ha (5 papers)
  4. Duc-Trong Le (10 papers)

Summary

We haven't generated a summary for this paper yet.