Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

QuALITY: Question Answering with Long Input Texts, Yes! (2112.08608v2)

Published 16 Dec 2021 in cs.CL

Abstract: To enable building and testing models on long-document comprehension, we introduce QuALITY, a multiple-choice QA dataset with context passages in English that have an average length of about 5,000 tokens, much longer than typical current models can process. Unlike in prior work with passages, our questions are written and validated by contributors who have read the entire passage, rather than relying on summaries or excerpts. In addition, only half of the questions are answerable by annotators working under tight time constraints, indicating that skimming and simple search are not enough to consistently perform well. Our baseline models perform poorly on this task (55.4%) and significantly lag behind human performance (93.5%).

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Richard Yuanzhe Pang (26 papers)
  2. Alicia Parrish (31 papers)
  3. Nitish Joshi (13 papers)
  4. Nikita Nangia (17 papers)
  5. Jason Phang (40 papers)
  6. Angelica Chen (22 papers)
  7. Vishakh Padmakumar (22 papers)
  8. Johnny Ma (1 paper)
  9. Jana Thompson (3 papers)
  10. He He (71 papers)
  11. Samuel R. Bowman (103 papers)
Citations (115)

Summary

We haven't generated a summary for this paper yet.