Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CONVERSER: Few-Shot Conversational Dense Retrieval with Synthetic Data Generation (2309.06748v1)

Published 13 Sep 2023 in cs.CL and cs.IR

Abstract: Conversational search provides a natural interface for information retrieval (IR). Recent approaches have demonstrated promising results in applying dense retrieval to conversational IR. However, training dense retrievers requires large amounts of in-domain paired data. This hinders the development of conversational dense retrievers, as abundant in-domain conversations are expensive to collect. In this paper, we propose CONVERSER, a framework for training conversational dense retrievers with at most 6 examples of in-domain dialogues. Specifically, we utilize the in-context learning capability of LLMs to generate conversational queries given a passage in the retrieval corpus. Experimental results on conversational retrieval benchmarks OR-QuAC and TREC CAsT 19 show that the proposed CONVERSER achieves comparable performance to fully-supervised models, demonstrating the effectiveness of our proposed framework in few-shot conversational dense retrieval. All source code and generated datasets are available at https://github.com/MiuLab/CONVERSER

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Chao-Wei Huang (28 papers)
  2. Chen-Yu Hsu (2 papers)
  3. Tsu-Yuan Hsu (8 papers)
  4. Chen-An Li (13 papers)
  5. Yun-Nung Chen (104 papers)
Citations (4)
Github Logo Streamline Icon: https://streamlinehq.com