Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Training Generative Question-Answering on Synthetic Data Obtained from an Instruct-tuned Model (2310.08072v2)

Published 12 Oct 2023 in cs.CL

Abstract: This paper presents a simple and cost-effective method for synthesizing data to train question-answering systems. For training, fine-tuning GPT models is a common practice in resource-rich languages like English, however, it becomes challenging for non-English languages due to the scarcity of sufficient question-answer (QA) pairs. Existing approaches use question and answer generators trained on human-authored QA pairs, which involves substantial human expenses. In contrast, we use an instruct-tuned model to generate QA pairs in a zero-shot or few-shot manner. We conduct experiments to compare various strategies for obtaining QA pairs from the instruct-tuned model. The results demonstrate that a model trained on our proposed synthetic data achieves comparable performance to a model trained on manually curated datasets, without incurring human costs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Kosuke Takahashi (6 papers)
  2. Takahiro Omi (7 papers)
  3. Kosuke Arima (2 papers)
  4. Tatsuya Ishigaki (4 papers)

Summary

We haven't generated a summary for this paper yet.