Modeling Multi-turn Conversation with Deep Utterance Aggregation (1806.09102v2)

Published 24 Jun 2018 in cs.CL

Abstract: Multi-turn conversation understanding is a major challenge for building intelligent dialogue systems. This work focuses on retrieval-based response matching for multi-turn conversation whose related work simply concatenates the conversation utterances, ignoring the interactions among previous utterances for context modeling. In this paper, we formulate previous utterances into context using a proposed deep utterance aggregation model to form a fine-grained context representation. In detail, a self-matching attention is first introduced to route the vital information in each utterance. Then the model matches a response with each refined utterance and the final matching score is obtained after attentive turns aggregation. Experimental results show our model outperforms the state-of-the-art methods on three multi-turn conversation benchmarks, including a newly introduced e-commerce dialogue corpus.

PDF Abstract

Deep Utterance Aggregation for Multi-turn Conversation Modeling

The paper "Modeling Multi-turn Conversation with Deep Utterance Aggregation" presents an advanced framework for retrieval-based response matching in multi-turn dialogue systems. The research targets one of the crucial challenges in intelligent dialogue systems: understanding and modeling the context of multi-turn conversations. The authors propose a novel approach that moves beyond the prevalent strategy of simple utterance concatenation, aiming to enhance context representation through a deep utterance aggregation model. This model captures intricate interactions among utterances, ultimately achieving finer context representation that leads to improved response accuracy.

The model architecture is structured to emphasize both intra-utterance and inter-utterance semantics. It integrates several sophisticated components, including self-matching attention mechanisms and thoughtful turns-aware aggregation. These elements collectively allow for the discrimination of critical information across conversation turns and the filtering of redundant data, respectively enhancing comprehension and evidencing context pertinence.

Model Components and Innovations:

Turns-aware Aggregation: This design aggregates utterances by focusing on their relationship to the latest utterance, which often contains key indicators of user intention. By selectively weighing prior interactions, this mechanism improves the semantic integration of the conversation context.
Self-matching Attention: Within each utterance, words are deemed of variable importance for the overall representation. This component dynamically highlights significant elements within utterances by routing attention across the utterance sequence. Such attention mechanisms are informed by the interaction between words and the entire context, extracting essential features and enhancing overall representation.
Response Matching Layer: The model constructs matching matrices at both the word and utterance levels to compare each context utterance with response candidates. By leveraging convolutional neural networks (CNNs), the model derives distinctive matching features, allowing better selection of relevant responses.
Attentive Turns Aggregation: This layer aggregates matching information over previous turns through a gated recurrent unit (GRU). It is crucial in summarizing interaction dynamics and further refines response prediction accuracy.

The proposed model demonstrated superior performance on three benchmark datasets: Ubuntu Dialogue Corpus, Douban Conversation Corpus, and a newly introduced E-commerce Dialogue Corpus (ECD). Notably, these experiments revealed that the model outperformed existing approaches, such as Sequential Matching Network (SMN), by a significant margin, particularly on the more diverse ECD dataset. This indicates robust adaptability to various conversation types, extending beyond typical chit-chat scenarios to domain-specific inquiries like e-commerce consultations.

Implications and Future Directions:

The findings hold practical significance in enhancing dialogue systems, which play integral roles in customer service applications and digital personal assistants. By effectively parsing multi-turn dialogues, systems can deliver more contextually appropriate responses, thereby improving user interactions.

The introduction of a public e-commerce dataset is a notable contribution, potentially serving as a valuable resource for subsequent research exploring domain-specific dialogue systems. Future work could explore integrating explicit topic tracking or handling simultaneous multiple intentions within conversations, which remain challenges despite the model's sophisticated architecture. Further, the issue of inherently multiple correct responses in real conversations can be addressed with nuanced evaluation metrics or human-in-the-loop annotations to recognize semantically varied but correct responses.

In conclusion, the deep utterance aggregation model marks substantive progress in multi-turn dialogue modeling. By capturing nuanced contextual information more effectively, this work contributes substantially to the ongoing evolution of robust, retrieval-based dialogue systems.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Zhuosheng Zhang (125 papers)
Jiangtong Li (24 papers)
Pengfei Zhu (76 papers)
Hai Zhao (227 papers)
Gongshen Liu (37 papers)

Citations (244)

View on Semantic Scholar

Modeling Multi-turn Conversation with Deep Utterance Aggregation (1806.09102v2)

Deep Utterance Aggregation for Multi-turn Conversation Modeling

Related Papers