Few-shot Natural Language Generation for Task-Oriented Dialog (2002.12328v1)

Published 27 Feb 2020 in cs.CL

Abstract: As a crucial component in task-oriented dialog systems, the Natural Language Generation (NLG) module converts a dialog act represented in a semantic form into a response in natural language. The success of traditional template-based or statistical models typically relies on heavily annotated data, which is infeasible for new domains. Therefore, it is pivotal for an NLG system to generalize well with limited labelled data in real applications. To this end, we present FewShotWoz, the first NLG benchmark to simulate the few-shot learning setting in task-oriented dialog systems. Further, we develop the SC-GPT model. It is pre-trained on a large set of annotated NLG corpus to acquire the controllable generation ability, and fine-tuned with only a few domain-specific labels to adapt to new domains. Experiments on FewShotWoz and the large Multi-Domain-WOZ datasets show that the proposed SC-GPT significantly outperforms existing methods, measured by various automatic metrics and human evaluations.

PDF Abstract

Few-shot Natural Language Generation for Task-Oriented Dialog

The paper "Few-shot Natural Language Generation for Task-Oriented Dialog" presents an innovative approach to addressing the challenges in Natural Language Generation (NLG) for task-oriented dialog systems, particularly focusing on scenarios where labeled data is scarce. The primary contribution of the paper is the introduction of FewShotWOZ, a benchmark specifically designed to simulate a few-shot learning setting in task-oriented dialog systems. This addresses the common issue faced in real-world applications where extensive labeled datasets are often unavailable for new domains.

Proposed Model: SC-GPT

The authors present the Semantically-Conditioned Generative Pre-Training (SC-GPT) model, a multi-layer Transformer pre-trained on a large corpus of annotated data to acquire the ability for controllable generation. The model is then fine-tuned on limited domain-specific labels to adapt to new domains. The SC-GPT model leverages three main stages in its training pipeline:

Massive Plain Language Pre-training: Utilizing the architecture of GPT-2, SC-GPT is initially pre-trained on vast amounts of text data to learn the general patterns of language.
Dialog-Act Controlled Pre-training: The model is subsequently pre-trained on a large annotated corpus of dialog-act pairs, acquiring the capability to generate responses controlled by specified semantic forms.
Fine-tuning: The final step involves fine-tuning the pre-trained model on limited domain-specific labeled data, ensuring effective adaptation to new contexts with minimal data.

Benchmark: FewShotWOZ

FewShotWOZ is designed to better reflect the practical scenarios encountered in task-oriented dialog systems, where annotated data is limited. It includes data from multiple domains such as restaurant recommendations and hotel booking, with each domain having fewer than 50 labeled examples for training. This presents a marked departure from existing datasets that often rely on thousands of labeled examples per domain, encouraging research in developing models that can generalize efficiently from few examples.

Experimental Results

The authors rigorously evaluate SC-GPT on both FewShotWOZ and the Multi-Domain WOZ datasets, demonstrating that it significantly outperforms baseline models, including the semantically conditioned LSTM (SC-LSTM) and hierarchical disentangled self-attention (HDSA) models, in BLEU scores and slot error rates (ERR). On the challenging FewShotWOZ benchmark, SC-GPT not only improves BLEU scores but also achieves lower ERRs, indicating its superior ability to generate fluent and semantically appropriate responses with minimal labeled input.

Implications and Future Directions

The introduction of FewShotWOZ and SC-GPT has several implications for both practical applications and theoretical research in AI. Practically, the SC-GPT model holds promise for deployment in domains where labeled data is scarce, facilitating broader use of task-oriented systems without extensive manual annotation. Theoretically, it opens pathways for exploring large-scale pre-training strategies and adaptive learning mechanisms that are efficient in few-shot settings.

The paper suggests several future directions, including enhancing models for more interpersonal interactions to improve user experiences and extending the generative pre-training paradigm to entire dialog systems for end-to-end learning. This would require bridging different modules within dialog systems through coherent generative frameworks, potentially leveraging segment-level auto-regressive models for comprehensive training across the system pipeline.

In conclusion, this paper contributes a significant advancement in the NLG field for dialog systems, emphasizing the potential of pre-trained models combined with few-shot learning to address real-world constraints effectively.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Baolin Peng (72 papers)
Chenguang Zhu (100 papers)
Chunyuan Li (122 papers)
Xiujun Li (37 papers)
Jinchao Li (22 papers)
Michael Zeng (76 papers)
Jianfeng Gao (344 papers)

Citations (191)

View on Semantic Scholar

Few-shot Natural Language Generation for Task-Oriented Dialog (2002.12328v1)