Determine LLM Summarization Accuracy for Multi-Document Sensemaking

Determine how accurately large language models can summarize when analyzing multiple given documents in sensemaking tasks, ideally by evaluating their outputs against established ground-truth summaries to quantify performance.

Background

The paper investigates LLM-supported summarization within complex sensemaking scenarios, where analysts must synthesize connections across multiple documents. Prior work on LLM summarization has emphasized hallucination, similarity, and quality in open-ended tasks, but less emphasis has been placed on accuracy when models must integrate information across multiple given documents with a definitive ground truth.

To address this gap, the authors propose using an intermediate visual workspace to steer LLM summarization and employ a sensemaking dataset with ground-truth summaries to evaluate accuracy. The explicitly stated knowledge gap concerns the level of accuracy achievable by LLMs in multi-document sensemaking contexts, motivating systematic evaluation with appropriate benchmarks.

References

We lack an understanding of how accurately LLMs can summarize when analyzing multiple given documents in sensemaking tasks.

— Steering LLM Summarization with Visual Workspaces for Sensemaking (2409.17289 - Tang et al., 2024) in Introduction

Determine LLM Summarization Accuracy for Multi-Document Sensemaking

Background

References

Related Problems