Hallucinated but Factual! Inspecting the Factuality of Hallucinations in Abstractive Summarization (2109.09784v2)

Published 30 Aug 2021 in cs.CL

Abstract: State-of-the-art abstractive summarization systems often generate \emph{hallucinations}; i.e., content that is not directly inferable from the source text. Despite being assumed incorrect, we find that much hallucinated content is factual, namely consistent with world knowledge. These factual hallucinations can be beneficial in a summary by providing useful background information. In this work, we propose a novel detection approach that separates factual from non-factual hallucinations of entities. Our method utilizes an entity's prior and posterior probabilities according to pre-trained and finetuned masked LLMs, respectively. Empirical results suggest that our approach vastly outperforms two baselines %in both accuracy and F1 scores and strongly correlates with human judgments. % on factuality classification tasks. Furthermore, we show that our detector, when used as a reward signal in an off-line reinforcement learning (RL) algorithm, significantly improves the factuality of summaries while maintaining the level of abstractiveness.

PDF Abstract

Insights into Hallucinated but Factual Content in Abstractive Summarization

The paper titled "Hallucinated but Factual! Inspecting the Factuality of Hallucinations in Abstractive Summarization" presents an intriguing exploration of the phenomenon of hallucinations in state-of-the-art abstractive summarization systems. Authored by Meng Cao, Yue Dong, and Jackie Chi Kit Cheung, the paper challenges the prevailing assumption that hallucinated content generated by these systems is inherently incorrect or undesirable. Instead, the authors provide evidence that certain hallucinated content, termed factual hallucinations, aligns with world knowledge and can indeed enhance summary comprehension.

The paper introduces a novel methodology for discerning factual from non-factual hallucinations, with a focus on entities like names, locations, and dates, which are pivotal to the saliency of a summary. The methodology involves evaluating an entity's prior and posterior probabilities through masked LLMs. Empirical analyses underscore the superiority of this approach over conventional baselines, as it exhibits a strong correlation with human judgment in terms of classification accuracy and F1 scores.

A significant finding from the paper is that a notable portion of the hallucinated content, specifically around 30% of entities within BART-generated summaries on the XSum test set, is factual. This is contextually significant as it pertains to the use of background knowledge that, while not directly inferable from the source text, is consistent with established world knowledge. Such factual hallucinations are posited to potentially enrich summaries by offering informative context not contained within the original document.

The methodological innovation is the differentiation of factual from non-factual hallucinations via the computation of prior and posterior probabilities using both an unconditional masked LLM (MLM) and a conditional masked LLM (CMLM). The authors employ a K-Nearest Neighbors (KNN) classifier using these probabilistic measures to predict the hallucination status of entities, demonstrating that lower probabilities often correlate with non-factual hallucinations.

Practical applications of these findings are illustrated through the integration of their hallucination detector as a reward signal within an off-line reinforcement learning framework. This integration showed marked improvements in the factual consistency of summarizations produced by the models while maintaining their abstractiveness, highlighting a potential pathway for enhancing the reliability of machine-generated summaries in practical applications.

From a theoretical perspective, this paper challenges the reductionist view of hallucinations in summarization models and opens avenues for a more nuanced understanding of their role. It advocates that not all hallucinations lead to degradation of factual consistency; instead, they can serve as conduits for inferring supplemental knowledge that enhances the utility of machine summarizations. The investigation also sets a precedent for future exploration into balancing abstractiveness and factuality, especially given the rising concerns over the quality and reliability of automated text generation models.

In conclusion, this research makes a compelling case for re-evaluating the function and utility of hallucinated content in machine summarization tasks. By illustrating the potential benefits of certain hallucinations, it advocates for advancements in both detection and generation methodologies that could significantly enhance the factual integrity and informativeness of automated summaries. Future studies could further explore the integration of such techniques across different generative models and broader datasets, potentially paving the way for more robust and contextually aware summarization systems.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Meng Cao (107 papers)
Yue Dong (61 papers)
Jackie Chi Kit Cheung (57 papers)

Citations (132)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - mcao516/EntFA (26 stars)