Directed Acyclic Graph Network for Conversational Emotion Recognition (2105.12907v2)

Published 27 May 2021 in cs.CL

Abstract: The modeling of conversational context plays a vital role in emotion recognition from conversation (ERC). In this paper, we put forward a novel idea of encoding the utterances with a directed acyclic graph (DAG) to better model the intrinsic structure within a conversation, and design a directed acyclic neural network, namely DAG-ERC, to implement this idea. In an attempt to combine the strengths of conventional graph-based neural models and recurrence-based neural models, DAG-ERC provides a more intuitive way to model the information flow between long-distance conversation background and nearby context. Extensive experiments are conducted on four ERC benchmarks with state-of-the-art models employed as baselines for comparison. The empirical results demonstrate the superiority of this new model and confirm the motivation of the directed acyclic graph architecture for ERC.

Authors (4)

Weizhou Shen (18 papers)
Siyue Wu (4 papers)
Yunyi Yang (20 papers)
Xiaojun Quan (52 papers)

Citations (215)

View on Semantic Scholar

Summary

Directed Acyclic Graph Network for Conversational Emotion Recognition

The paper "Directed Acyclic Graph Network for Conversational Emotion Recognition" presents a novel approach to emotion recognition in conversations (ERC) using a directed acyclic graph (DAG) framework. The authors, Weizhou Shen, Siyue Wu, Yunyi Yang, and Xiaojun Quan, propose a DAG-based neural network model, DAG-ERC, to solve the limitations found in traditional methods of conversational context modeling.

Context and Motivation

Emotion recognition in conversation (ERC) is garnering increasing attention because of its applicability in areas like social media monitoring and emotive dialogue systems. Previous works have predominantly used either graph-based or recurrence-based methods to capture conversational context. Graph-based methods aggregate information from surrounding utterances without considering distant context or sequential information, while recurrence-based models capture sequential dynamics but struggle with long-range dependencies.

Methodology

The proposed DAG-ERC model treats conversations as directed acyclic graphs. This structural choice allows combining attributes of both graph-based and recurrence-based models, addressing their individual limitations. The paper introduces a mechanism to convert conversations into DAGs, where each node represents an utterance and edges signify potential information flow based on speaker identity and positional constraints.

DAG Construction: The DAG construction includes constraints for information directionality, remote context cutoff, and local context preservation. This allows the DAG to selectively transmit relevant past information to the target utterance.
DAG-ERC Architecture: Inspired by DAGNN, the DAG-ERC model processes each layer using a recurrent order aggregated from preceding layers. New improvements include a relation-aware transformation to incorporate speaker identity and a contextual information unit alongside a nodal information unit to manage information flow.

Experimental Results

The experimental validation involves four prominent ERC datasets: IEMOCAP, MELD, DailyDialog, and EmoryNLP. DAG-ERC consistently performs on par or better than existing state-of-the-art models across these datasets. Noteworthy observations include:

Enhanced Contextual Encoding: DAG-ERC's ability to integrate both local and distant contexts intuitively supports its superior performance, particularly highlighted in datasets with longer dialogs like IEMOCAP.
Robust Performance Over Layers: Unlike traditional GNN models prone to over-smoothing when stacking multiple layers, DAG-ERC maintains performance stability, supporting broader structural elaboration within network layers.

Implications and Future Work

The utilization of directed acyclic graphs for ERC proposes exciting theoretical implications and potential extensions. Introducing relation-aware transformations based on speaker identity enhances the model's applicability, especially in multi-speaker dialogues. However, challenges such as distinguishing similar emotions and dealing with neutral class dominance remain open. Addressing these issues could further refine ERC systems, paving the way for more nuanced applications in emotionally intelligent systems.

An area meriting exploration is the integration of more complex relational types beyond current binary classification—it may capture speaker interactions' subtleties more comprehensively. Additionally, exploring alternative neural architectures for different edge types could inform future model adaptations for broader context-inference applications in ERC and beyond.

In conclusion, DAG-ERC represents a significant methodological advancement in capturing conversational context for emotion recognition. Its approach to integrating directed acyclic graph constructs into ERC illustrates the potential for more nuanced and context-aware systems, which may extend into various human-computer interaction domains.

PDF Markdown

Related Papers

Find Related Papers