- The paper introduces EmoBERTa, a novel approach that prepends speaker identifiers to dialogue inputs to enhance contextual emotion recognition.
- It leverages a modified roberta-large model with an input format that combines past, current, and future utterances to capture dynamic speaker interactions.
- EmoBERTa achieves state-of-the-art weighted F1 scores of 65.61% on MELD and 68.57% on IEMOCAP, demonstrating significant improvements over baseline models.
EmoBERTa: Speaker-Aware Emotion Recognition in Conversation with RoBERTa
The paper presents EmoBERTa, a novel approach to the task of Emotion Recognition in Conversation (ERC) that leverages the speaker's identity using RoBERTa. The strategy involves prepending speaker names to utterances and inserting separation tokens, allowing the system to capture intra- and inter-speaker states and contexts effectively. This method achieves state-of-the-art results on two popular ERC datasets, demonstrating the effectiveness of simple modifications to existing transformer models like RoBERTa.
Methodology
EmoBERTa builds upon the pretrained roberta-large model to tackle the ERC task by adding a linear layer to the final hidden state associated with the [CLS] token. The paper emphasizes the unique format of input sequences which includes three segments representing past, current, and future utterances within a dialogue. A key modification involves appending speaker identifiers, allowing the model to distinguish contributions from various interlocutors. This adaptation reportedly improves context modeling and emotion prediction over baseline models without speaker information.
Experimental Results
In experiments, EmoBERTa significantly outperformed contemporary models on the MELD and IEMOCAP datasets. Notably, the inclusion of both past and future utterances with speaker identifiers was advantageous for these performance improvements, indicated by weighted F1 scores of 65.61% on MELD and 68.57% on IEMOCAP. The results underscore the importance of speaker context for emotion prediction in a conversational setting, refining the contextual understanding necessary for ERC tasks.
Implications and Future Work
The findings illustrate the potential for enhanced conversational emotion modeling by integrating simple yet contextually rich inputs using pre-existing LLM architectures. The implications extend to various applications, such as sentiment analysis in customer service interactions and the improvement of human-computer interaction systems.
Future research can explore extending EmoBERTa's framework to multimodal ERC tasks and experimenting with various speaker representation techniques to refine this area further. Additionally, future studies may investigate other alignment models that may contribute to dialogue representations that emphasize dynamic speaker shifts or the implicit tones of discussions.
Overall, the development of EmoBERTa contributes significantly to advancing the state of the art in emotion recognition tasks, emphasizing the utility of leveraging speaker identities within dialog systems. This approach showcases a straightforward yet highly effective means of enhancing emotional understanding in dialogue, promising further advancements in the domain of conversational AI.