Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Supervised Prototypical Contrastive Learning for Emotion Recognition in Conversation (2210.08713v2)

Published 17 Oct 2022 in cs.AI

Abstract: Capturing emotions within a conversation plays an essential role in modern dialogue systems. However, the weak correlation between emotions and semantics brings many challenges to emotion recognition in conversation (ERC). Even semantically similar utterances, the emotion may vary drastically depending on contexts or speakers. In this paper, we propose a Supervised Prototypical Contrastive Learning (SPCL) loss for the ERC task. Leveraging the Prototypical Network, the SPCL targets at solving the imbalanced classification problem through contrastive learning and does not require a large batch size. Meanwhile, we design a difficulty measure function based on the distance between classes and introduce curriculum learning to alleviate the impact of extreme samples. We achieve state-of-the-art results on three widely used benchmarks. Further, we conduct analytical experiments to demonstrate the effectiveness of our proposed SPCL and curriculum learning strategy. We release the code at https://github.com/caskcsg/SPCL.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Xiaohui Song (33 papers)
  2. Longtao Huang (27 papers)
  3. Hui Xue (109 papers)
  4. Songlin Hu (80 papers)
Citations (74)

Summary

Supervised Prototypical Contrastive Learning for Emotion Recognition in Conversation

Emotion Recognition in Conversation (ERC) has emerged as a vital domain in the development of dialogue systems, which has substantial applications ranging from chatbots to opinion mining in social media. The identification of emotions is complex due to the variance in emotions even within semantically similar utterances, influenced by context and speaker dynamics. The paper "Supervised Prototypical Contrastive Learning for Emotion Recognition in Conversation" introduces a novel methodology, Supervised Prototypical Contrastive Learning (SPCL), designed to tackle class imbalance in ERC tasks without necessitating large batch sizes.

Methodological Overview

The paper addresses two primary challenges in ERC: class imbalance and the ineffectiveness of textual information alone to distinguish emotions in multi-modal datasets. SPCL is formulated by integrating Prototypical Networks and Supervised Contrastive Learning (SCL). This integration aids in stabilizing the ERC models against imbalance via a representation queue for each emotion class and a prototypical vector computation for loss calculation. Extending upon the principles of SCL, SPCL employs these prototypes to ensure that within any mini-batch, each instance finds representative positive and negative samples. This alleviates the dependency on mini-batch size, making the model robust even in small batches.

To further mitigate the impact of 'extreme' samples, which are not easily distinguishable purely through textual data, the paper introduces a curriculum learning strategy. By devising a distance-based difficulty measure function, the curriculum learning organizes the training data in a manner that progressively advances from simpler to harder instances, allowing the model to adapt and learn effectively.

Experimental Results

The efficacy of SPCL combined with a curriculum strategy is established through evaluation on three established ERC benchmarks: IEMOCAP, MELD, and EmoryNLP. The approach delivers state-of-the-art results, outperforming preceding methodologies not only in overall performance but also in handling class imbalance, particularly evident in scenarios where batch sizes are constrained. The approach's resilience against small batch sizes and its lessened sensitivity to class imbalance mark significant improvements over conventional SCL techniques.

Implications and Future Directions

The implications of this research span both practical and theoretical realms. Practically, the ability to recognize emotions precisely in a class-imbalanced setting, without intensive computational resources, makes SPCL viable for real-time applications. Theoretically, it paves the way for further exploration into prototypical methods combined with contrastive learning, potentially extending beyond ERC to other domains of natural language processing or even computer vision where class imbalance is prevalent.

Future directions could explore the extension of this methodology to multi-modal fusion in ERC, integrating audio and visual cues more effectively. Additionally, refining the curriculum learning approach by exploring alternative distance measures could further stabilize model training against extreme samples.

Overall, the paper contributes substantially to advancing ERC by strategically leveraging contrastive and prototypical learning paradigms, potentially setting a new standard for handling class imbalance in conversational contexts.