Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deep Imbalanced Learning for Multimodal Emotion Recognition in Conversations (2312.06337v1)

Published 11 Dec 2023 in cs.SD, cs.CL, and eess.AS

Abstract: The main task of Multimodal Emotion Recognition in Conversations (MERC) is to identify the emotions in modalities, e.g., text, audio, image and video, which is a significant development direction for realizing machine intelligence. However, many data in MERC naturally exhibit an imbalanced distribution of emotion categories, and researchers ignore the negative impact of imbalanced data on emotion recognition. To tackle this problem, we systematically analyze it from three aspects: data augmentation, loss sensitivity, and sampling strategy, and propose the Class Boundary Enhanced Representation Learning (CBERL) model. Concretely, we first design a multimodal generative adversarial network to address the imbalanced distribution of {emotion} categories in raw data. Secondly, a deep joint variational autoencoder is proposed to fuse complementary semantic information across modalities and obtain discriminative feature representations. Finally, we implement a multi-task graph neural network with mask reconstruction and classification optimization to solve the problem of overfitting and underfitting in class boundary learning, and achieve cross-modal emotion recognition. We have conducted extensive experiments on the IEMOCAP and MELD benchmark datasets, and the results show that CBERL has achieved a certain performance improvement in the effectiveness of emotion recognition. Especially on the minority class fear and disgust emotion labels, our model improves the accuracy and F1 value by 10% to 20%.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. N. Majumder, S. Poria, D. Hazarika, R. Mihalcea, A. Gelbukh, and E. Cambria, “Dialoguernn: An attentive rnn for emotion detection in conversations,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01.   AAAI, 2019, pp. 6818–6825.
  2. N. Yin, L. Shen, M. Wang, X. Luo, Z. Luo, and D. Tao, “Omg: Towards effective graph classification against label noise,” IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 12, pp. 12 873–12 886, 2023.
  3. N. Yin, L. Shen, H. Xiong, B. Gu, C. Chen, X. Hua, S. Liu, and X. Luo, “Messages are never propagated alone: Collaborative hypergraph neural network for time-series forecasting,” IEEE Transactions on Pattern Analysis and Machine Intelligence, no. 01, pp. 1–15, nov 5555.
  4. Y. Shou, T. Meng, W. Ai, S. Yang, and K. Li, “Conversational emotion recognition studies based on graph convolutional neural networks and a dependent syntactic analysis,” Neurocomputing, vol. 501, pp. 629–639, 2022.
  5. N. Yin, L. Shen, M. Wang, L. Lan, Z. Ma, C. Chen, X.-S. Hua, and X. Luo, “Coco: A coupled contrastive framework for unsupervised domain adaptive graph classification,” arXiv preprint arXiv:2306.04979, 2023.
  6. N. Yin, L. Shen, B. Li, M. Wang, X. Luo, C. Chen, Z. Luo, and X.-S. Hua, “Deal: An unsupervised domain adaptive framework for graph-level classification,” in Proceedings of the 30th ACM International Conference on Multimedia, ser. MM ’22.   New York, NY, USA: Association for Computing Machinery, 2022, p. 3470–3479.
  7. A. Zadeh and P. Pu, “Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers).   ACL, 2018, pp. 2236–2246.
  8. J. Liu, S. Chen, L. Wang, Z. Liu, Y. Fu, L. Guo, and J. Dang, “Multimodal emotion recognition with capsule graph convolutional based representation fusion,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2021, pp. 6339–6343.
  9. N.-T. Tran, V.-H. Tran, N.-B. Nguyen, T.-K. Nguyen, and N.-M. Cheung, “On data augmentation for gan training,” IEEE Transactions on Image Processing, vol. 30, pp. 1882–1897, 2021.
  10. P. Dai, Y. Li, H. Zhang, J. Li, and X. Cao, “Accurate scene text detection via scale-aware data augmentation and shape similarity constraint,” IEEE Transactions on Multimedia, vol. 24, pp. 1883–1895, 2022.
  11. H. Chen, B. Du, S. Luo, and W. Hu, “Deep point set resampling via gradient fields,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2022.
  12. H. Liu, J. Lin, S. Xu, T. Bi, and Y. Lao, “A resampling method based on filter designed by window function considering frequency aliasing,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 67, no. 12, pp. 5018–5030, 2020.
  13. T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE International Conference on Computer Vision.   IEEE, 2017, pp. 2980–2988.
  14. B. Li, Y. Liu, and X. Wang, “Gradient harmonized single-stage detector,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp. 8577–8584.
  15. B.-H. Su and C.-C. Lee, “Unsupervised cross-corpus speech emotion recognition using a multi-source cycle-gan,” IEEE Transactions on Affective Computing, pp. 1–1, 2022.
  16. W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” in Advances in Neural Information Processing Systems, vol. 30.   MIT Press, 2017.
  17. Y. Shou, T. Meng, W. Ai, C. Xie, H. Liu, and Y. Wang, “Object detection in medical images based on hierarchical transformer and mask mechanism,” Computational Intelligence and Neuroscience, vol. 2022, 2022.
  18. R. Ying, Y. Shou, and C. Liu, “Prediction model of dow jones index based on lstm-adaboost,” in 2021 International Conference on Communications, Information System and Computer Engineering (CISCE).   IEEE, 2021, pp. 808–812.
  19. T. Meng, Y. Shou, W. Ai, J. Du, H. Liu, and K. Li, “A multi-message passing framework based on heterogeneous graphs in conversational emotion recognition,” Available at SSRN 4353605, 2021.
  20. Y. Shou, W. Ai, and T. Meng, “Graph information bottleneck for remote sensing segmentation,” arXiv preprint arXiv:2312.02545, 2023.
  21. Y. Shou, W. Ai, T. Meng, and K. Li, “Czl-ciae: Clip-driven zero-shot learning for correcting inverse age estimation,” arXiv preprint arXiv:2312.01758, 2023.
  22. X. Du, C. Ma, G. Zhang, J. Li, Y.-K. Lai, G. Zhao, X. Deng, Y.-J. Liu, and H. Wang, “An efficient lstm network for emotion recognition from multichannel eeg signals,” IEEE Transactions on Affective Computing, pp. 1–1, 2020.
  23. Z. Lian, B. Liu, and J. Tao, “Ctnet: Conversational transformer network for emotion recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 985–1000, 2021.
  24. T. Zhang, X. Wang, X. Xu, and C. L. P. Chen, “Gcb-net: Graph convolutional broad network and its application in emotion recognition,” IEEE Transactions on Affective Computing, vol. 13, no. 1, pp. 379–388, 2022.
  25. D. Nguyen, D. T. Nguyen, R. Zeng, T. T. Nguyen, S. N. Tran, T. Nguyen, S. Sridharan, and C. Fookes, “Deep auto-encoders with sequential learning for multimodal dimensional emotion recognition,” IEEE Transactions on Multimedia, vol. 24, pp. 1313–1324, 2022.
  26. L. Qin, W. Che, Y. Li, M. Ni, and T. Liu, “Dcr-net: A deep co-interactive relation network for joint dialog act recognition and sentiment classification,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05.   AAAI, 2020, pp. 8665–8672.
  27. Y. Cui, W. Che, T. Liu, B. Qin, and Z. Yang, “Pre-training with whole word masking for chinese bert,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 3504–3514, 2021.
  28. S. Xing, S. Mai, and H. Hu, “Adapted dynamic memory network for emotion recognition in conversation,” IEEE Transactions on Affective Computing, pp. 1–1, 2020.
  29. D. Hazarika, S. Poria, A. Zadeh, E. Cambria, L.-P. Morency, and R. Zimmermann, “Conversational memory network for emotion recognition in dyadic dialogue videos,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics, vol. 1.   NAACL, 2018, pp. 2122–2132.
  30. D. Ghosal, N. Majumder, S. Poria, N. Chhaya, and A. Gelbukh, “Dialoguegcn: A graph convolutional neural network for emotion recognition in conversation,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).   ACL, 2019, pp. 154–164.
  31. D. Hazarika, S. Poria, R. Mihalcea, E. Cambria, and R. Zimmermann, “Icon: Interactive conversational memory network for multimodal emotion detection,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.   ACL, 2018, pp. 2594–2604.
  32. L. Yi and M.-W. Mak, “Improving speech emotion recognition with adversarial data augmentation network,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 1, pp. 172–184, 2022.
  33. S. Zeng, B. Zhang, J. Gou, and Y. Xu, “Regularization on augmented data to diversify sparse representation for robust image classification,” IEEE Transactions on Cybernetics, vol. 52, no. 6, pp. 4935–4948, 2022.
  34. S. Wang, Y. Yang, Z. Wu, Y. Qian, and K. Yu, “Data augmentation using deep generative models for embedding based speaker recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 2598–2609, 2020.
  35. J.-S. Kang, S. Kavuri, and M. Lee, “Ica-evolution based data augmentation with ensemble deep neural networks using time and frequency kernels for emotion recognition from eeg-data,” IEEE Transactions on Affective Computing, vol. 13, no. 2, pp. 616–627, 2022.
  36. X. Zhang, J. Liu, J. Shen, S. Li, K. Hou, B. Hu, J. Gao, and T. Zhang, “Emotion recognition from multimodal physiological signals using a regularized deep fusion of kernel machine,” IEEE Transactions on Cybernetics, vol. 51, no. 9, pp. 4386–4399, 2020.
  37. Z. Liu, Y. Shen, V. B. Lakshminarasimhan, P. P. Liang, A. B. Zadeh, and L.-P. Morency, “Efficient low-rank multimodal fusion with modality-specific factors,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).   ACL, 2018, pp. 2247–2256.
  38. A. Zadeh, M. Chen, S. Poria, E. Cambria, and L.-P. Morency, “Tensor fusion network for multimodal sentiment analysis,” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.   ACL, 2017, pp. 1103–1114.
  39. W. Zhou, X. Lin, J. Lei, L. Yu, and J.-N. Hwang, “Mffenet: Multiscale feature fusion and enhancement network for rgb–thermal urban road scene parsing,” IEEE Transactions on Multimedia, vol. 24, pp. 2526–2538, 2022.
  40. H. Yu, C. Sun, X. Yang, S. Zheng, and H. Zou, “Fuzzy support vector machine with relative density information for classifying imbalanced data,” IEEE Transactions on Fuzzy Systems, vol. 27, no. 12, pp. 2353–2367, 2019.
  41. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote: synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, no. 1, pp. 321–357, 2002.
  42. H. Han, W.-Y. Wang, and B.-H. Mao, “Borderline-smote: a new over-sampling method in imbalanced data sets learning,” in International Conference on Intelligent Computing.   Springer, 2005, pp. 878–887.
  43. G. Wang, T. Zhou, K.-S. Choi, and J. Lu, “A deep-ensemble-level-based interpretable takagi–sugeno–kang fuzzy classifier for imbalanced data,” IEEE Transactions on Cybernetics, vol. 52, no. 5, pp. 3805–3818, 2022.
  44. D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3d convolutional networks,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 4489–4497.
  45. F. Eyben, M. Wöllmer, and B. Schuller, “Opensmile: the munich versatile and fast open-source audio feature extractor,” in Proceedings of the 18th ACM International Conference on Multimedia.   ACM, 2010, pp. 1459–1462.
  46. Z. Li, F. Tang, M. Zhao, and Y. Zhu, “Emocaps: Emotion capsule based model for conversational emotion recognition,” in Findings of the Association for Computational Linguistics: ACL 2022, 2022, pp. 1610–1618.
  47. J. Hu, Y. Liu, J. Zhao, and Q. Jin, “Mmgcn: Multimodal fusion via deep graph convolution network for emotion recognition in conversation,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021, pp. 5666–5675.
  48. A. Graves and A. Graves, “Long short-term memory,” Supervised sequence labelling with recurrent neural networks, pp. 37–45, 2012.
  49. S. Poria, D. Hazarika, N. Majumder, G. Naik, E. Cambria, and R. Mihalcea, “Meld: A multimodal multi-party dataset for emotion recognition in conversations,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.   ACL, 2019, pp. 527–536.
  50. C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J. N. Chang, S. Lee, and S. S. Narayanan, “Iemocap: Interactive emotional dyadic motion capture database,” Language Resources and Evaluation, vol. 42, no. 4, pp. 335–359, 2008.
  51. Y. Kim, “Convolutional neural networks for sentence classification,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP).   ACL, 2014, pp. 1746–1751.
  52. S. Poria, E. Cambria, D. Hazarika, N. Majumder, A. Zadeh, and L.-P. Morency, “Context-dependent sentiment analysis in user-generated videos,” in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (volume 1: Long papers).   ACL, 2017, pp. 873–883.
  53. M. Ren, X. Huang, W. Li, D. Song, and W. Nie, “Lr-gcn: Latent relation-aware graph convolutional network for conversational emotion recognition,” IEEE Transactions on Multimedia, pp. 1–1, 2021.
  54. L. van der Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of Machine Learning Research, vol. 9, no. 86, pp. 2579–2605, 2008.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Tao Meng (48 papers)
  2. Yuntao Shou (28 papers)
  3. Wei Ai (48 papers)
  4. Nan Yin (33 papers)
  5. Keqin Li (61 papers)
Citations (30)