Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

EmoBERTa: Speaker-Aware Emotion Recognition in Conversation with RoBERTa (2108.12009v1)

Published 26 Aug 2021 in cs.CL

Abstract: We present EmoBERTa: Speaker-Aware Emotion Recognition in Conversation with RoBERTa, a simple yet expressive scheme of solving the ERC (emotion recognition in conversation) task. By simply prepending speaker names to utterances and inserting separation tokens between the utterances in a dialogue, EmoBERTa can learn intra- and inter- speaker states and context to predict the emotion of a current speaker, in an end-to-end manner. Our experiments show that we reach a new state of the art on the two popular ERC datasets using a basic and straight-forward approach. We've open sourced our code and models at https://github.com/tae898/erc.

Citations (90)

Summary

  • The paper introduces EmoBERTa, a novel approach that prepends speaker identifiers to dialogue inputs to enhance contextual emotion recognition.
  • It leverages a modified roberta-large model with an input format that combines past, current, and future utterances to capture dynamic speaker interactions.
  • EmoBERTa achieves state-of-the-art weighted F1 scores of 65.61% on MELD and 68.57% on IEMOCAP, demonstrating significant improvements over baseline models.

EmoBERTa: Speaker-Aware Emotion Recognition in Conversation with RoBERTa

The paper presents EmoBERTa, a novel approach to the task of Emotion Recognition in Conversation (ERC) that leverages the speaker's identity using RoBERTa. The strategy involves prepending speaker names to utterances and inserting separation tokens, allowing the system to capture intra- and inter-speaker states and contexts effectively. This method achieves state-of-the-art results on two popular ERC datasets, demonstrating the effectiveness of simple modifications to existing transformer models like RoBERTa.

Methodology

EmoBERTa builds upon the pretrained roberta-large model to tackle the ERC task by adding a linear layer to the final hidden state associated with the [CLS] token. The paper emphasizes the unique format of input sequences which includes three segments representing past, current, and future utterances within a dialogue. A key modification involves appending speaker identifiers, allowing the model to distinguish contributions from various interlocutors. This adaptation reportedly improves context modeling and emotion prediction over baseline models without speaker information.

Experimental Results

In experiments, EmoBERTa significantly outperformed contemporary models on the MELD and IEMOCAP datasets. Notably, the inclusion of both past and future utterances with speaker identifiers was advantageous for these performance improvements, indicated by weighted F1 scores of 65.61% on MELD and 68.57% on IEMOCAP. The results underscore the importance of speaker context for emotion prediction in a conversational setting, refining the contextual understanding necessary for ERC tasks.

Implications and Future Work

The findings illustrate the potential for enhanced conversational emotion modeling by integrating simple yet contextually rich inputs using pre-existing LLM architectures. The implications extend to various applications, such as sentiment analysis in customer service interactions and the improvement of human-computer interaction systems.

Future research can explore extending EmoBERTa's framework to multimodal ERC tasks and experimenting with various speaker representation techniques to refine this area further. Additionally, future studies may investigate other alignment models that may contribute to dialogue representations that emphasize dynamic speaker shifts or the implicit tones of discussions.

Overall, the development of EmoBERTa contributes significantly to advancing the state of the art in emotion recognition tasks, emphasizing the utility of leveraging speaker identities within dialog systems. This approach showcases a straightforward yet highly effective means of enhancing emotional understanding in dialogue, promising further advancements in the domain of conversational AI.

Github Logo Streamline Icon: https://streamlinehq.com