Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling (1609.01454v1)

Published 6 Sep 2016 in cs.CL

Abstract: Attention-based encoder-decoder neural network models have recently shown promising results in machine translation and speech recognition. In this work, we propose an attention-based neural network model for joint intent detection and slot filling, both of which are critical steps for many speech understanding and dialog systems. Unlike in machine translation and speech recognition, alignment is explicit in slot filling. We explore different strategies in incorporating this alignment information to the encoder-decoder framework. Learning from the attention mechanism in encoder-decoder model, we further propose introducing attention to the alignment-based RNN models. Such attentions provide additional information to the intent classification and slot label prediction. Our independent task models achieve state-of-the-art intent detection error rate and slot filling F1 score on the benchmark ATIS task. Our joint training model further obtains 0.56% absolute (23.8% relative) error reduction on intent detection and 0.23% absolute gain on slot filling over the independent task models.

Authors (2)

Bing Liu (212 papers)
Ian Lane (29 papers)

Citations (660)

View on Semantic Scholar

Summary

Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling

This paper presents a paper on the application of attention-based recurrent neural network (RNN) models for the joint tasks of intent detection and slot filling within spoken language understanding (SLU) systems. The authors propose novel approaches that integrate attention mechanisms into alignment-based RNN models to enhance performance in these critical components of dialogue systems.

Background and Methodology

Intent detection and slot filling are pivotal in constructing effective SLU systems, traditionally considered as separate tasks. Intent detection involves classifying utterances into predefined categories, while slot filling is a sequence labeling task aimed at extracting semantic constituents from natural language queries. Recent advancements in encoder-decoder models with attention mechanisms have demonstrated impressive outcomes in related fields such as machine translation and speech recognition. However, unlike those applications where alignment between sequences may not be explicit, slot filling involves a direct mapping between input words and output labels.

The authors investigate the potential of combining attention-based and alignment-based methods to improve the accuracy of these tasks. They detail the use of a bidirectional RNN encoder with attention incorporated in an encoder-decoder framework. This framework is capable of modeling sequences with explicit alignment, thus leveraging alignment information effectively. Additionally, the work explores utilizing attention in an alignment-based bidirectional RNN, introducing context vectors to aid slot filling and intent prediction.

Experimental Evaluation

The experiments are conducted on the benchmark ATIS dataset, a standard in SLU research, which contains utterances annotated with intents and slots. The models are evaluated using F1 score for slot filling and error rate for intent detection.

The independent models for each task achieved state-of-the-art results, with the attention-based approaches providing slight improvements over non-attention models. Specifically, the encoder-decoder model with aligned inputs and attention achieved an F1 score of 95.78% for slot filling and an intent detection error rate of 2.02%. Comparatively, attention-based RNN models showed competitive performance with an F1 score of 95.75% and an error rate of 2.35%.

In joint training, where both tasks are modeled simultaneously, further improvements were noted. The joint attention-based RNN model showed a significant relative improvement in intent detection error by 23.8% and a slight increase in slot filling F1 score over independently trained models.

Implications and Future Directions

The paper's findings underscore the efficacy of merging attention mechanisms with alignment-based RNNs in SLU tasks. By jointly modeling intent detection and slot filling, computational efficiency is enhanced as only a single model is required.

From a theoretical perspective, this integration highlights the potential for encoder-decoder architectures to accommodate tasks necessitating explicit alignment, offering avenues for refining sequence learning across varied applications. Practically, the advancements can lead to more efficient and accurate dialogue systems, streamlining voice-activated technologies and human-machine interactions.

Future research could delve into expanding these models to accommodate multilingual data or explore deeper architectures that may harness the advantages of attention in lengthy sequences. Additionally, exploration of diverse SLU datasets can further validate the robustness and adaptability of these methods in real-world applications.

PDF Markdown

Related Papers

Find Related Papers