Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ODSum: New Benchmarks for Open Domain Multi-Document Summarization (2309.08960v1)

Published 16 Sep 2023 in cs.CL

Abstract: Open-domain Multi-Document Summarization (ODMDS) is a critical tool for condensing vast arrays of documents into coherent, concise summaries. With a more inter-related document set, there does not necessarily exist a correct answer for the retrieval, making it hard to measure the retrieving performance. We propose a rule-based method to process query-based document summarization datasets into ODMDS datasets. Based on this method, we introduce a novel dataset, ODSum, a sophisticated case with its document index interdependent and often interrelated. We tackle ODMDS with the \textit{retrieve-then-summarize} method, and the performance of a list of retrievers and summarizers is investigated. Through extensive experiments, we identify variances in evaluation metrics and provide insights into their reliability. We also found that LLMs suffer great performance loss from retrieving errors. We further experimented methods to improve the performance as well as investigate their robustness against imperfect retrieval. We will release our data and code at https://github.com/yale-nlp/ODSum.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yijie Zhou (16 papers)
  2. Kejian Shi (11 papers)
  3. Wencai Zhang (1 paper)
  4. Yixin Liu (108 papers)
  5. Yilun Zhao (59 papers)
  6. Arman Cohan (121 papers)
Citations (2)