A Network-based End-to-End Trainable Task-oriented Dialogue System (1604.04562v3)

Published 15 Apr 2016 in cs.CL, cs.AI, cs.NE, and stat.ML

Abstract: Teaching machines to accomplish tasks by conversing naturally with humans is challenging. Currently, developing task-oriented dialogue systems requires creating multiple components and typically this involves either a large amount of handcrafting, or acquiring costly labelled datasets to solve a statistical learning problem for each component. In this work we introduce a neural network-based text-in, text-out end-to-end trainable goal-oriented dialogue system along with a new way of collecting dialogue data based on a novel pipe-lined Wizard-of-Oz framework. This approach allows us to develop dialogue systems easily and without making too many assumptions about the task at hand. The results show that the model can converse with human subjects naturally whilst helping them to accomplish tasks in a restaurant search domain.

PDF Abstract

A Network-based End-to-End Trainable Task-oriented Dialogue System

The paper "A Network-based End-to-End Trainable Task-oriented Dialogue System" by Tsung-Hsien Wen et al. presents an innovative neural network-based model designed for developing task-oriented dialogue systems capable of natural interaction with human users. This model is noteworthy for its end-to-end trainability, modular design, and its ability to achieve competitive task performance with limited training data.

Introduction

Traditional task-oriented dialogue systems for applications such as hotel bookings or technical support necessitate substantial handcrafting or extensive labelled datasets for each system component. The proposed model mitigates these issues by leveraging end-to-end neural network training. This system operates without explicit user goal modelling and instead uses distributed representations for user intents and database (DB) attributes, facilitating a balance between structured information and flexible user interaction.

Model Architecture

The proposed framework is built upon several interconnected neural network modules:

Intent Network: This component encodes user inputs into a distributed vector representation using either LSTM or CNN architectures. The model was designed to replace hand-coded dialogue act representations with these learned embeddings.
Belief Trackers: These RNNs track the dialogue state by maintaining probabilities over slot-value pairs, representing user constraints and requests. A CNN feature extractor enriches these trackers by capturing long-distance dependencies in user queries. They also employ a delexicalisation mechanism to generalize across similar dialogue scenarios.
Policy Network and Database Operator: The policy network synthesizes the various internal states, including the intent representation, belief state, and DB query results. The DB operator conducts searches based on these states and guides the selection of system actions.
Generation Network: This LSTM-based component generates system responses based on the policy network’s outputs. An attention mechanism is incorporated to dynamically aggregate relevant information from the belief state during the response generation process.

Data Collection using Wizard-of-Oz

The researchers employed a novel crowdsourced Wizard-of-Oz (WOZ) framework to collect training data. This approach involves two separate interfaces for users and wizards, where users propose queries or requirements and wizards respond based on the dialogue history and database searches. The parallel processing nature of this setup allows multiple dialogues to progress simultaneously, ensuring a consistent and coherent dataset while minimizing operational latencies.

Empirical Evaluation

The paper conducted both corpus-based and human evaluations to gauge system performance:

Corpus-based Evaluation: This evaluation measured BLEU scores, entity matching rates, and objective task success rates. Results revealed that incorporating belief trackers (RNN-CNN) significantly improved task success rates and allowed for better encoding of dialogue states compared to simpler sequence-to-sequence models.
Human Evaluation: The system was assessed by users on Amazon Mechanical Turk, with metrics including task success rate, comprehension, and naturalness. The system achieved a high subjective success rate of 98%, with comprehension and naturalness ratings above 4 out of 5.

Implications and Future Work

This research holds substantial implications for the development of more efficient, scalable, and natural task-oriented dialogue systems. By reducing the reliance on extensive labelled datasets and handcrafted rules, the proposed end-to-end model offers a versatile framework adaptable to various application domains. Future research directions may include extending this model to handle spoken dialogue inputs directly and exploring its scalability to broader and more complex task domains.

In summary, the work by Wen et al. contributes a robust and innovative system that bridges gaps between traditional modular dialogue systems and emerging neural network-based methods, providing a promising pathway for future advancements in the field of task-oriented dialogue systems.

PDF Markdown Bookmark Chat (Pro)

Authors (8)

Tsung-Hsien Wen (27 papers)
David Vandyke (18 papers)
Lina M. Rojas-Barahona (20 papers)
Pei-Hao Su (25 papers)
Stefan Ultes (32 papers)
Steve Young (30 papers)
Nikola Mrksic (10 papers)
Milica Gasic (18 papers)

Citations (1,078)

View on Semantic Scholar