Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

WebCPM: Interactive Web Search for Chinese Long-form Question Answering (2305.06849v2)

Published 11 May 2023 in cs.CL, cs.AI, and cs.IR

Abstract: Long-form question answering (LFQA) aims at answering complex, open-ended questions with detailed, paragraph-length responses. The de facto paradigm of LFQA necessitates two procedures: information retrieval, which searches for relevant supporting facts, and information synthesis, which integrates these facts into a coherent answer. In this paper, we introduce WebCPM, the first Chinese LFQA dataset. One unique feature of WebCPM is that its information retrieval is based on interactive web search, which engages with a search engine in real time. Following WebGPT, we develop a web search interface. We recruit annotators to search for relevant information using our interface and then answer questions. Meanwhile, the web search behaviors of our annotators would be recorded. In total, we collect 5,500 high-quality question-answer pairs, together with 14,315 supporting facts and 121,330 web search actions. We fine-tune pre-trained LLMs to imitate human behaviors for web search and to generate answers based on the collected facts. Our LFQA pipeline, built on these fine-tuned models, generates answers that are no worse than human-written ones in 32.5% and 47.5% of the cases on our dataset and DuReader, respectively.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (15)
  1. Yujia Qin (41 papers)
  2. Zihan Cai (1 paper)
  3. Dian Jin (13 papers)
  4. Lan Yan (5 papers)
  5. Shihao Liang (11 papers)
  6. Kunlun Zhu (12 papers)
  7. Yankai Lin (125 papers)
  8. Xu Han (270 papers)
  9. Ning Ding (122 papers)
  10. Huadong Wang (15 papers)
  11. Ruobing Xie (97 papers)
  12. Fanchao Qi (33 papers)
  13. Zhiyuan Liu (433 papers)
  14. Maosong Sun (337 papers)
  15. Jie Zhou (687 papers)
Citations (70)

Summary

We haven't generated a summary for this paper yet.