Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SQuALITY: Building a Long-Document Summarization Dataset the Hard Way (2205.11465v1)

Published 23 May 2022 in cs.CL and cs.AI

Abstract: Summarization datasets are often assembled either by scraping naturally occurring public-domain summaries -- which are nearly always in difficult-to-work-with technical domains -- or by using approximate heuristics to extract them from everyday text -- which frequently yields unfaithful summaries. In this work, we turn to a slower but more straightforward approach to developing summarization benchmark data: We hire highly-qualified contractors to read stories and write original summaries from scratch. To amortize reading time, we collect five summaries per document, with the first giving an overview and the subsequent four addressing specific questions. We use this protocol to collect SQuALITY, a dataset of question-focused summaries built on the same public-domain short stories as the multiple-choice dataset QuALITY (Pang et al., 2021). Experiments with state-of-the-art summarization systems show that our dataset is challenging and that existing automatic evaluation metrics are weak indicators of quality.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Alex Wang (32 papers)
  2. Richard Yuanzhe Pang (26 papers)
  3. Angelica Chen (22 papers)
  4. Jason Phang (40 papers)
  5. Samuel R. Bowman (103 papers)
Citations (38)
Github Logo Streamline Icon: https://streamlinehq.com