Training Generative Question-Answering on Synthetic Data Obtained from an Instruct-tuned Model (2310.08072v2)

Published 12 Oct 2023 in cs.CL

Abstract: This paper presents a simple and cost-effective method for synthesizing data to train question-answering systems. For training, fine-tuning GPT models is a common practice in resource-rich languages like English, however, it becomes challenging for non-English languages due to the scarcity of sufficient question-answer (QA) pairs. Existing approaches use question and answer generators trained on human-authored QA pairs, which involves substantial human expenses. In contrast, we use an instruct-tuned model to generate QA pairs in a zero-shot or few-shot manner. We conduct experiments to compare various strategies for obtaining QA pairs from the instruct-tuned model. The results demonstrate that a model trained on our proposed synthetic data achieves comparable performance to a model trained on manually curated datasets, without incurring human costs.

Authors (4)

Kosuke Takahashi (6 papers)
Takahiro Omi (7 papers)
Kosuke Arima (2 papers)
Tatsuya Ishigaki (4 papers)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Training Generative Question-Answering on Synthetic Data Obtained from an Instruct-tuned Model (2310.08072v2)

Summary

Related Papers