Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ITEACH-Net: Inverted Teacher-studEnt seArCH Network for Emotion Recognition in Conversation (2312.15583v3)

Published 25 Dec 2023 in cs.MM

Abstract: There remain two critical challenges that hinder the development of ERC. Firstly, there is a lack of exploration into mining deeper insights from the data itself for conversational emotion tasks. Secondly, the systems exhibit vulnerability to random modality feature missing, which is a common occurrence in realistic settings. Focusing on these two key challenges, we propose a novel framework for incomplete multimodal learning in ERC, called "Inverted Teacher-studEnt seArCH Network (ITEACH-Net)." ITEACH-Net comprises two novel components: the Emotion Context Changing Encoder (ECCE) and the Inverted Teacher-Student (ITS) framework. Specifically, leveraging the tendency for emotional states to exhibit local stability within conversational contexts, ECCE captures these patterns and further perceives their evolution over time. Recognizing the varying challenges of handling incomplete versus complete data, ITS employs a teacher-student framework to decouple the respective computations. Subsequently, through Neural Architecture Search, the student model develops enhanced computational capabilities for handling incomplete data compared to the teacher model. During testing, we design a novel evaluation method, testing the model's performance under different missing rate conditions without altering the model weights. We conduct experiments on three benchmark ERC datasets, and the results demonstrate that our ITEACH-Net outperforms existing methods in incomplete multimodal ERC. We believe ITEACH-Net can inspire relevant research on the intrinsic nature of emotions within conversation scenarios and pave a more robust route for incomplete learning techniques. Codes will be made available.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Deep canonical correlation analysis. In International conference on machine learning, ICML, 1247–1255. PMLR.
  2. IEMOCAP: interactive emotional dyadic motion capture database. Lang. Resour. Evaluation, 42(4): 335–359.
  3. Cambria, E. 2016. Affective Computing and Sentiment Analysis. IEEE Intell. Syst., 31(2): 102–107.
  4. Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP, 9180–9192.
  5. Misa: Modality-invariant and-specific representations for multimodal sentiment analysis. In Proceedings of the ACM International Conference on Multimedia, ACM MM, 1122–1131.
  6. Hotelling, H. 1992. Relations between two sets of variates. In Breakthroughs in statistics: methodology and distribution, 162–190. Springer.
  7. Multi-Task Learning based Survival Analysis for Predicting Alzheimer’s Disease Progression with Multi-Source Block-wise Missing Data. In Proceedings of the 2018 SIAM International Conference on Data Mining, SDM, 288–296.
  8. GCNet: Graph Completion Network for Incomplete Multimodal Learning in Conversation. IEEE Trans. Pattern Anal. Mach. Intell, TPAMI, 45(7): 8419–8432.
  9. CTNet: Conversational Transformer Network for Emotion Recognition. IEEE ACM Trans. Audio Speech Lang. Process., 29: 985–1000.
  10. Multi-modality Information Fusion for Radiomics-Based Neural Architecture Search. In Medical Image Computing and Computer Assisted Intervention, MICCAI, 763–771.
  11. Emotion Recognition from Speech Using wav2vec 2.0 Embeddings. In INTERSPEECH, 3400–3404.
  12. MFAS: Multimodal Fusion Architecture Search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 6966–6975.
  13. A review of affective computing: From unimodal analysis to multimodal fusion. Inf. Fusion, 37: 98–125.
  14. Improving Speech Emotion Recognition Performance using Differentiable Architecture Search. arXiv preprint arXiv:2305.14402.
  15. Computational Media Intelligence: Human-Centered Machine Analysis of Media. Proc. IEEE, 109(5): 891–910.
  16. The Multimodal Sentiment Analysis in Car Reviews (MuSe-CaR) Dataset: Collection, Insights and Improvements. IEEE Trans. Affect. Comput, TAC, 14(2): 1334–1350.
  17. EmotionNAS: Two-stream Architecture Search for Speech Emotion Recognition. In INTERSPEECH.
  18. MFAS: Emotion Recognition through Multiple Perspectives Fusion Architecture Search Emulating Human Cognition. arXiv preprint arXiv:2306.09361.
  19. Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis. IEEE Transactions on Affective Computing, TAC.
  20. Missing Modalities Imputation via Cascaded Residual Autoencoder. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 4971–4980.
  21. Multimodal Transformer for Unaligned Multimodal Language Sequences. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL, 6558–6569.
  22. Attention is All you Need. In Advances in Neural Information Processing Systems, NIPS, 5998–6008.
  23. On Deep Multi-View Representation Learning. In International conference on machine learning, ICML, 1083–1092.
  24. Neural architecture search for speech emotion recognition. In International Conference on Acoustics, Speech and Signal Processing, ICASSP, 6902–6906. IEEE.
  25. Multi-Tensor Fusion Network with Hybrid Attention for Multimodal Sentiment Analysis. In International Conference on Machine Learning and Cybernetics, ICMLC, 169–174.
  26. Searching for BurgerFormer with micro-meso-macro space design. In International Conference on Machine Learning, ICML, 25055–25069. PMLR.
  27. BM-NAS: Bilevel Multimodal Neural Architecture Search. In AAAI Conference on Artificial Intelligence, AAAI, 8901–8909.
  28. Do transformers really perform badly for graph representation? Advances in Neural Information Processing Systems, NIPS, 34: 28877–28888.
  29. Metaformer is actually what you need for vision. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, CVPR, 10819–10829.
  30. CH-SIMS: A Chinese Multimodal Sentiment Analysis Dataset with Fine-grained Annotation of Modality. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL, 3718–3727.
  31. Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis. In AAAI Conference on Artificial Intelligence, AAAI, 10790–10797.
  32. Deep Multimodal Neural Architecture Search. In Proceedings of the ACM International Conference on Multimedia, ACM MM, 3743–3752.
  33. Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data. NeuroImage, 61(3): 622–632.
  34. Transformer-based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis. In Proceedings of the ACM International Conference on Multimedia, ACM MM, 4400–4407.
  35. Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL, 2236–2246.
  36. Multimodal Sentiment Intensity Analysis in Videos: Facial Gestures and Verbal Messages. IEEE Intell. Syst., 31(6): 82–88.
  37. Deep Partial Multi-View Learning. IEEE Trans. Pattern Anal. Mach. Intell, TPAMI, 44(5): 2402–2415.
  38. Missing Modality Imagination Network for Emotion Recognition with Uncertain Missing Modalities. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL, 2608–2618.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Haiyang Sun (45 papers)
  2. Zheng Lian (51 papers)
  3. Licai Sun (19 papers)
  4. Bin Liu (441 papers)
  5. Jianhua Tao (139 papers)
  6. Chenglong Wang (80 papers)
  7. Kang Chen (61 papers)

Summary

We haven't generated a summary for this paper yet.