Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SSLCL: An Efficient Model-Agnostic Supervised Contrastive Learning Framework for Emotion Recognition in Conversations (2310.16676v3)

Published 25 Oct 2023 in cs.CL

Abstract: Emotion recognition in conversations (ERC) is a rapidly evolving task within the natural language processing community, which aims to detect the emotions expressed by speakers during a conversation. Recently, a growing number of ERC methods have focused on leveraging supervised contrastive learning (SCL) to enhance the robustness and generalizability of learned features. However, current SCL-based approaches in ERC are impeded by the constraint of large batch sizes and the lack of compatibility with most existing ERC models. To address these challenges, we propose an efficient and model-agnostic SCL framework named Supervised Sample-Label Contrastive Learning with Soft-HGR Maximal Correlation (SSLCL), which eliminates the need for a large batch size and can be seamlessly integrated with existing ERC models without introducing any model-specific assumptions. Specifically, we introduce a novel perspective on utilizing label representations by projecting discrete labels into dense embeddings through a shallow multilayer perceptron, and formulate the training objective to maximize the similarity between sample features and their corresponding ground-truth label embeddings, while minimizing the similarity between sample features and label embeddings of disparate classes. Moreover, we innovatively adopt the Soft-HGR maximal correlation as a measure of similarity between sample features and label embeddings, leading to significant performance improvements over conventional similarity measures. Additionally, multimodal cues of utterances are effectively leveraged by SSLCL as data augmentations to boost model performances. Extensive experiments on two ERC benchmark datasets, IEMOCAP and MELD, demonstrate the compatibility and superiority of our proposed SSLCL framework compared to existing state-of-the-art SCL methods. Our code is available at \url{https://github.com/TaoShi1998/SSLCL}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Multimodal Attentive Learning for Real-time Explainable Emotion Recognition in Conversations. In 2022 IEEE International Symposium on Circuits and Systems (ISCAS), 1210–1214.
  2. An Information-Theoretic Approach to Transferability in Task Transfer Learning. In 2019 IEEE International Conference on Image Processing (ICIP), 2309–2313.
  3. IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation, 42: 335–359.
  4. Multivariate, Multi-Frequency and Multimodal: Rethinking Graph Neural Networks for Emotion Recognition in Conversation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10761–10770.
  5. Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552.
  6. Gebelein, H. 1941. Das statistische Problem der Korrelation als Variations-und Eigenwertproblem und sein Zusammenhang mit der Ausgleichsrechnung. ZAMM-Journal of Applied Mathematics and Mechanics/Zeitschrift für Angewandte Mathematik und Mechanik, 21(6): 364–379.
  7. DialogueGCN: A Graph Convolutional Neural Network for Emotion Recognition in Conversation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 154–164. Hong Kong, China: Association for Computational Linguistics.
  8. Hirschfeld, H. O. 1935. A connection between correlation and contingency. In Mathematical Proceedings of the Cambridge Philosophical Society, volume 31, 520–524. Cambridge University Press.
  9. DialogueCRN: Contextual Reasoning Networks for Emotion Recognition in Conversations. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 7042–7052. Online: Association for Computational Linguistics.
  10. MMGCN: Multimodal Fusion via Deep Graph Convolution Network for Emotion Recognition in Conversation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 5666–5675. Online: Association for Computational Linguistics.
  11. Supervised contrastive learning. Advances in neural information processing systems, 33: 18661–18673.
  12. EmoBERTa: Speaker-Aware Emotion Recognition in Conversation with RoBERTa. arXiv:2108.12009.
  13. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  14. CoMPM: Context Modeling with Speaker’s Pre-trained Memory Tracking for Emotion Recognition in Conversation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 5669–5679. Seattle, United States: Association for Computational Linguistics.
  15. HiTrans: A Transformer-Based Context- and Speaker-Sensitive Model for Emotion Detection in Conversations. In Proceedings of the 28th International Conference on Computational Linguistics, 4190–4200. Barcelona, Spain (Online): International Committee on Computational Linguistics.
  16. Contrast and generation make bart a good dialogue emotion recognizer. In Proceedings of the AAAI conference on artificial intelligence, volume 36, 11002–11010.
  17. Cross-Modal Fusion Techniques for Utterance-Level Emotion Recognition from Text and Speech. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5.
  18. An efficient approach for audio-visual emotion recognition with missing labels and missing modalities. In 2021 IEEE international conference on multimedia and Expo (ICME), 1–6. IEEE.
  19. Dialoguernn: An attentive rnn for emotion detection in conversations. In Proceedings of the AAAI conference on artificial intelligence, volume 33, 6818–6825.
  20. Context-Dependent Sentiment Analysis in User-Generated Videos. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 873–883. Vancouver, Canada: Association for Computational Linguistics.
  21. MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 527–536. Florence, Italy: Association for Computational Linguistics.
  22. Rényi, A. 1959. On measures of dependence. Acta mathematica hungarica, 10(3-4): 441–451.
  23. Directed Acyclic Graph Network for Conversational Emotion Recognition. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 1551–1560. Online: Association for Computational Linguistics.
  24. MultiEMO: An Attention-Based Correlation-Aware Multimodal Fusion Framework for Emotion Recognition in Conversations. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 14752–14766. Toronto, Canada: Association for Computational Linguistics.
  25. Supervised Prototypical Contrastive Learning for Emotion Recognition in Conversation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 5197–5206. Abu Dhabi, United Arab Emirates: Association for Computational Linguistics.
  26. Label Embedding Network: Learning Label Representation for Soft Training of Deep Networks. arXiv:1710.10393.
  27. Context or Knowledge is Not Always Necessary: A Contrastive Learning Framework for Emotion Recognition in Conversations. In Findings of the Association for Computational Linguistics: ACL 2023, 14054–14067. Toronto, Canada: Association for Computational Linguistics.
  28. Context- and Sentiment-Aware Networks for Emotion Recognition in Conversation. IEEE Transactions on Artificial Intelligence, 3(5): 699–708.
  29. An efficient approach to informative feature extraction from multimodal data. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, 5281–5288.
  30. Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE conference on computer vision and pattern recognition, 3733–3742.
  31. Self-adaptive Context and Modal-interaction Modeling For Multimodal Emotion Recognition. In Findings of the Association for Computational Linguistics: ACL 2023, 6267–6281. Toronto, Canada: Association for Computational Linguistics.
  32. Cluster-level contrastive learning for emotion recognition in conversations. IEEE Transactions on Affective Computing.
  33. DualGATs: Dual Graph Attention Networks for Emotion Recognition in Conversations. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 7395–7408. Toronto, Canada: Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Tao Shi (73 papers)
  2. Xiao Liang (132 papers)
  3. Yaoyuan Liang (5 papers)
  4. Xinyi Tong (10 papers)
  5. Shao-Lun Huang (48 papers)

Summary

We haven't generated a summary for this paper yet.