Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GO FIGURE: A Meta Evaluation of Factuality in Summarization (2010.12834v2)

Published 24 Oct 2020 in cs.CL

Abstract: While neural LLMs can generate text with remarkable fluency and coherence, controlling for factual correctness in generation remains an open research question. This major discrepancy between the surface-level fluency and the content-level correctness of neural generation has motivated a new line of research that seeks automatic metrics for evaluating the factuality of machine text. In this paper, we introduce GO FIGURE, a meta-evaluation framework for evaluating factuality evaluation metrics. We propose five necessary and intuitive conditions to evaluate factuality metrics on diagnostic factuality data across three different summarization tasks. Our benchmark analysis on ten factuality metrics reveals that our meta-evaluation framework provides a robust and efficient evaluation that is extensible to multiple types of factual consistency and standard generation metrics, including QA metrics. It also reveals that while QA metrics generally improve over standard metrics that measure factuality across domains, performance is highly dependent on the way in which questions are generated.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Saadia Gabriel (23 papers)
  2. Asli Celikyilmaz (81 papers)
  3. Rahul Jha (13 papers)
  4. Yejin Choi (287 papers)
  5. Jianfeng Gao (344 papers)
Citations (90)