Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ConvSDG: Session Data Generation for Conversational Search (2403.11335v1)

Published 17 Mar 2024 in cs.IR and cs.CL

Abstract: Conversational search provides a more convenient interface for users to search by allowing multi-turn interaction with the search engine. However, the effectiveness of the conversational dense retrieval methods is limited by the scarcity of training data required for their fine-tuning. Thus, generating more training conversational sessions with relevant labels could potentially improve search performance. Based on the promising capabilities of LLMs on text generation, we propose ConvSDG, a simple yet effective framework to explore the feasibility of boosting conversational search by using LLM for session data generation. Within this framework, we design dialogue/session-level and query-level data generation with unsupervised and semi-supervised learning, according to the availability of relevance judgments. The generated data are used to fine-tune the conversational dense retriever. Extensive experiments on four widely used datasets demonstrate the effectiveness and broad applicability of our ConvSDG framework compared with several strong baselines.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Fengran Mo (35 papers)
  2. Bole Yi (1 paper)
  3. Kelong Mao (23 papers)
  4. Chen Qu (37 papers)
  5. Kaiyu Huang (16 papers)
  6. Jian-Yun Nie (70 papers)
Citations (6)