Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems (1508.01745v2)

Published 7 Aug 2015 in cs.CL

Abstract: Natural language generation (NLG) is a critical component of spoken dialogue and it has a significant impact both on usability and perceived quality. Most NLG systems in common use employ rules and heuristics and tend to generate rigid and stylised responses without the natural variation of human language. They are also not easily scaled to systems covering multiple domains and languages. This paper presents a statistical language generator based on a semantically controlled Long Short-term Memory (LSTM) structure. The LSTM generator can learn from unaligned data by jointly optimising sentence planning and surface realisation using a simple cross entropy training criterion, and language variation can be easily achieved by sampling from output candidates. With fewer heuristics, an objective evaluation in two differing test domains showed the proposed method improved performance compared to previous methods. Human judges scored the LSTM system higher on informativeness and naturalness and overall preferred it to the other systems.

PDF Abstract

Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems

The paper "Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems" by Tsung-Hsien Wen et al. addresses the longstanding challenge of natural language generation (NLG) within the framework of spoken dialogue systems (SDS). Traditional rule-based or template-based NLG systems, despite their robustness, often generate responses that lack the natural variability found in human language, and are not scalable across multiple domains and languages. This paper introduces a novel solution in the form of a statistical language generator, incorporating a semantically controlled Long Short-term Memory (LSTM) network, abbreviated as SC-LSTM.

Main Contributions

The primary contributions of the paper can be summarized as follows:

Joint Optimization Framework: The SC-LSTM model integrates sentence planning and surface realization in a single joint optimization framework using a cross-entropy training criterion. This enables the model to learn from unaligned data, which significantly simplifies the data preparation process.
Semantic Control Mechanism: The SC-LSTM introduces a semantic control cell that modulates the dialogue act (DA) information during sentence generation. This mechanism ensures that the generated utterance accurately reflects the intended semantics.
Deep Network Architecture: Extending the SC-LSTM to a deep neural network structure improves the generator's performance by leveraging the advantages of deep learning, such as enhanced feature representation and improved generalization.
Backward LSTM Reranking: To better handle sentence forms that depend on backward context, a backward LSTM reranker is trained to select the best candidates from the forward generator outputs, further enhancing the fluency and adequacy of generated sentences.
Empirical Validation: The paper presents empirical evaluations demonstrating the SC-LSTM's superior performance across two domains: restaurants and hotels in San Francisco. The experimental results exhibit significant improvements in both BLEU scores and slot error rates compared to several baselines, including a handcrafted generator, k-nearest neighbors (kNN), and class-based LLMs (class LM).

Methodology

Semantic Controlled LSTM (SC-LSTM)

The SC-LSTM architecture extends traditional LSTM by adding a semantic control cell that manages the dialogue act (DA) features dynamically during text generation. The DA features are managed through a reading gate, which selectively retains or discards specific semantic attributes at each time step. This mechanism ensures that the generated text remains coherent and faithful to the input semantics.

Deep SC-LSTM

The authors extend the SC-LSTM to a deep structure by stacking multiple LSTM layers. Skip connections and dropout techniques are employed to mitigate vanishing gradient problems and prevent overfitting, respectively. This deep architecture allows for more sophisticated feature extraction, leading to higher accuracy in text generation.

Evaluation and Results

The experimental evaluation includes objective metrics (BLEU and slot error rates) and subjective evaluations via human judges. The results show that the SC-LSTM, particularly its deep variant, achieves the highest BLEU scores and the lowest slot error rates among all compared methods. Human evaluations also indicate a preference for SC-LSTM generated utterances in terms of informativeness and naturalness.

The research highlights notable strengths:

Higher BLEU Scores: The SC-LSTM consistently outperforms other methods, achieving BLEU scores of 0.731 in the restaurant domain and 0.832 in the hotel domain.
Lower Slot Error Rates: The deep SC-LSTM model achieves the lowest slot error rates, indicating superior adequacy and accuracy in information rendering.
Human Preference: Subjective evaluations reflect a strong preference for SC-LSTM generated responses over other methods.

Implications and Future Developments

This paper marks significant progress in the field of NLG for SDS. The integration of semantic control within an LSTM framework addresses both accuracy and naturalness, two critical factors in user perception and satisfaction. The demonstrated ease of scaling to different domains and potential for multilingual applications opens avenues for developing more adaptable and natural SDS.

Future developments could explore further conditioning the generator on additional dialogue features such as discourse information or social cues. Additionally, the end-to-end trainability of the neural network-based approach holds promise for further enhancements in dialogue variability and response richness.

The paper concludes by acknowledging the support from Toshiba Research Europe Ltd, reinforcing the importance of industrial collaboration in advancing academic research.

In summary, the introduction of semantically controlled LSTM-based generation represents a substantial advance towards more natural, informative, and scalable language generation systems in spoken dialogue applications.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Tsung-Hsien Wen (27 papers)
Pei-Hao Su (25 papers)
David Vandyke (18 papers)
Steve Young (30 papers)
Milica Gasic (18 papers)
Nikola Mrksic (10 papers)

Citations (935)

View on Semantic Scholar

Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems (1508.01745v2)