Papers
Topics
Authors
Recent
2000 character limit reached

DetectiveQA: Evaluating Long-Context Reasoning on Detective Novels

Published 4 Sep 2024 in cs.CL | (2409.02465v2)

Abstract: Recently, significant efforts have been devoted to enhancing the long-context capabilities of LLMs, particularly in long-context reasoning. To facilitate this research, we propose \textbf{DetectiveQA}, a dataset specifically designed for narrative reasoning within long contexts. We leverage detective novels, averaging over 100k tokens, to create a dataset containing 1200 human-annotated questions in both Chinese and English, each paired with corresponding reference reasoning steps. Furthermore, we introduce a step-wise reasoning metric, which enhances the evaluation of LLMs' reasoning processes. We validate our approach and evaluate the mainstream LLMs, including GPT-4, Claude, and LLaMA, revealing persistent long-context reasoning challenges and demonstrating their evidence-retrieval challenges. Our findings offer valuable insights into the study of long-context reasoning and lay the base for more rigorous evaluations.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.