Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CoMPM: Context Modeling with Speaker's Pre-trained Memory Tracking for Emotion Recognition in Conversation (2108.11626v3)

Published 26 Aug 2021 in cs.CL and cs.AI

Abstract: As the use of interactive machines grow, the task of Emotion Recognition in Conversation (ERC) became more important. If the machine-generated sentences reflect emotion, more human-like sympathetic conversations are possible. Since emotion recognition in conversation is inaccurate if the previous utterances are not taken into account, many studies reflect the dialogue context to improve the performances. Many recent approaches show performance improvement by combining knowledge into modules learned from external structured data. However, structured data is difficult to access in non-English languages, making it difficult to extend to other languages. Therefore, we extract the pre-trained memory using the pre-trained LLM as an extractor of external knowledge. We introduce CoMPM, which combines the speaker's pre-trained memory with the context model, and find that the pre-trained memory significantly improves the performance of the context model. CoMPM achieves the first or second performance on all data and is state-of-the-art among systems that do not leverage structured data. In addition, our method shows that it can be extended to other languages because structured knowledge is not required, unlike previous methods. Our code is available on github (https://github.com/rungjoo/CoMPM).

Citations (72)

Summary

  • The paper introduces CoMPM, which combines context modeling and a pre-trained memory module to significantly enhance emotion recognition accuracy.
  • It utilizes a RoBERTa-initialized Transformer and pre-trained language models to effectively capture dialogue dynamics and speaker-specific cues.
  • Extensive experiments on four benchmarks demonstrate CoMPM's robustness and potential for multilingual applications without structured external knowledge.

Context Modeling with Speaker's Pre-trained Memory Tracking for Emotion Recognition in Conversation

The paper entitled "CoMPM: Context Modeling with Speaker's Pre-trained Memory Tracking for Emotion Recognition in Conversation" presents a novel approach to enhance Emotion Recognition in Conversation (ERC) systems, which are increasingly critical in applications such as interactive chatbots and personalized dialogue systems. The primary objective of the paper is to improve the accuracy of ERC by leveraging the speaker's pre-trained memory, thus eliminating the necessity for structured external knowledge that is often limited to English datasets.

Methodology

The proposed model, CoMPM, integrates two core modules: the context embedding module (CoM) and the pre-trained memory module (PM). The CoM aims to understand the sequence of dialogue and predict the current emotional state by processing all previous utterances within the context of the conversation. It employs a Transformer encoder initialized with RoBERTa, which has demonstrated effectiveness in multiple NLP tasks.

Conversely, PM focuses on extracting and utilizing external knowledge from the previous utterances of the speaker using pre-trained LLMs, representing this knowledge as pre-trained memory. This innovative feature provides a mechanism to predict emotions by considering the linguistic style and preferences of the speaker.

Experimental Validation

The efficacy of CoMPM is verified through extensive experiments across four benchmark datasets: MELD, EmoryNLP, IEMOCAP, and DailyDialog, representing both multi-party and dyadic-party conversations. CoMPM consistently achieved top-tier performance, often outstripping all existing systems that rely solely on pre-trained models without external knowledge. Notably, it competes robustly with systems that incorporate structured data, underscoring the advantage of leveraging pre-trained memory.

Analysis of Results

The comparative evaluation delineates CoMPM's strength in maintaining high recognition performance across diverse conversation datasets. The PM component demonstrated substantial efficacy, especially in scenarios with limitations on available structured data, showcasing its potential utility in multilingual applications beyond English. By freezing PM parameters, CoMPM(f) reflects that when fine-tuning isn't possible due to limited data, performance degradation can be mitigated. This flexibility presents a pragmatic solution for deploying ERC systems in varied linguistic environments where resources might be sparse.

Implications and Future Work

The CoMPM model signifies a substantial advancement in the field of ERC by decoupling performance from the dependency on structured knowledge databases. Its adaptation to various languages attests to its practical application and potential for widespread deployment. While the current focus remains on enhancing context modeling, future work is suggested to explore deeper integration of the speaker's dynamics and further optimize the memory extraction process for richer emotional understanding. Moreover, increasing the model's capability to leverage memory from larger and more varied pre-trained LLMs could potentially provide even greater accuracy and robustness in recognizing emotions within conversations.

Overall, CoMPM offers a promising direction for developing more sophisticated, empathetic, and linguistically inclusive ERC systems. By mitigating reliance on structured data, it potentially accelerates the deployment of conversational agents across diverse cultural and linguistic contexts.