Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Understanding Factual Errors in Summarization: Errors, Summarizers, Datasets, Error Detectors (2205.12854v2)

Published 25 May 2022 in cs.CL and cs.AI

Abstract: The propensity of abstractive summarization models to make factual errors has been studied extensively, including design of metrics to detect factual errors and annotation of errors in current systems' outputs. However, the ever-evolving nature of summarization systems, metrics, and annotated benchmarks makes factuality evaluation a moving target, and drawing clear comparisons among metrics has become increasingly difficult. In this work, we aggregate factuality error annotations from nine existing datasets and stratify them according to the underlying summarization model. We compare performance of state-of-the-art factuality metrics, including recent ChatGPT-based metrics, on this stratified benchmark and show that their performance varies significantly across different types of summarization models. Critically, our analysis shows that much of the recent improvement in the factuality detection space has been on summaries from older (pre-Transformer) models instead of more relevant recent summarization models. We further perform a finer-grained analysis per error-type and find similar performance variance across error types for different factuality metrics. Our results show that no one metric is superior in all settings or for all error types, and we provide recommendations for best practices given these insights.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Liyan Tang (12 papers)
  2. Tanya Goyal (24 papers)
  3. Alexander R. Fabbri (34 papers)
  4. Philippe Laban (40 papers)
  5. Jiacheng Xu (41 papers)
  6. Semih Yavuz (43 papers)
  7. Justin F. Rousseau (11 papers)
  8. Greg Durrett (117 papers)
  9. Wojciech Kryściński (19 papers)
Citations (88)
X Twitter Logo Streamline Icon: https://streamlinehq.com