Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Quick Starting Dialog Systems with Paraphrase Generation (2204.02546v2)

Published 6 Apr 2022 in cs.CL

Abstract: Acquiring training data to improve the robustness of dialog systems can be a painstakingly long process. In this work, we propose a method to reduce the cost and effort of creating new conversational agents by artificially generating more data from existing examples, using paraphrase generation. Our proposed approach can kick-start a dialog system with little human effort, and brings its performance to a level satisfactory enough for allowing actual interactions with real end-users. We experimented with two neural paraphrasing approaches, namely Neural Machine Translation and a Transformer-based seq2seq model. We present the results obtained with two datasets in English and in French:~a crowd-sourced public intent classification dataset and our own corporate dialog system dataset. We show that our proposed approach increased the generalization capabilities of the intent classification model on both datasets, reducing the effort required to initialize a new dialog system and helping to deploy this technology at scale within an organization.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Louis Marceau (3 papers)
  2. Raouf Belbahar (1 paper)
  3. Marc Queudot (3 papers)
  4. Nada Naji (2 papers)
  5. Eric Charton (4 papers)
  6. Marie-Jean Meurs (10 papers)
Citations (2)