Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Faithful to the Original: Fact Aware Neural Abstractive Summarization (1711.04434v1)

Published 13 Nov 2017 in cs.IR and cs.CL

Abstract: Unlike extractive summarization, abstractive summarization has to fuse different parts of the source text, which inclines to create fake facts. Our preliminary study reveals nearly 30% of the outputs from a state-of-the-art neural summarization system suffer from this problem. While previous abstractive summarization approaches usually focus on the improvement of informativeness, we argue that faithfulness is also a vital prerequisite for a practical abstractive summarization system. To avoid generating fake facts in a summary, we leverage open information extraction and dependency parse technologies to extract actual fact descriptions from the source text. The dual-attention sequence-to-sequence framework is then proposed to force the generation conditioned on both the source text and the extracted fact descriptions. Experiments on the Gigaword benchmark dataset demonstrate that our model can greatly reduce fake summaries by 80%. Notably, the fact descriptions also bring significant improvement on informativeness since they often condense the meaning of the source text.

Fact Aware Neural Abstractive Summarization

This paper introduces a novel approach to enhancing faithfulness in neural abstractive summarization, addressing a pertinent issue within the task where nearly 30% of generated summaries from state-of-the-art systems contain fake facts. The research proposes a method that moves beyond merely aiming for informativeness by incorporating techniques from open information extraction and dependency parsing to identify and extract factual descriptions from source text.

The authors introduce a dual-attention sequence-to-sequence (s2s) framework—denoted as FTSum—that concurrently leverages both the source text and extracted factual descriptions to guide summary generation. This dual conditioning is achieved by employing two parallel RNN encoders that feed into a decoder with a dual-attention mechanism. The inclusion of these factual descriptions significantly diminishes the generation of fake facts, as demonstrated by experimental results showing an 80% reduction in fake summaries when compared to a standard s2s framework.

A crucial aspect of their model is the integration of a context selection gate that assesses the reliability of the source text and the factual content, weighting them accordingly during the generation process. Through extensive evaluations on the Gigaword dataset, FTSum not only proved to reduce factual errors significantly but also enhanced informativeness, achieving higher ROUGE scores than existing models.

The paper's implications point towards improvements in practical summarization systems, especially in sensitive domains where factual accuracy is paramount. Furthermore, the methodology delineated in the research opens avenues for augmenting s2s models with factual guidance systems, potentially benefiting other areas of natural language processing where maintaining factual integrity is essential.

Future work could involve integrating advanced mechanisms like copying or coverage into the framework to further enhance its applicability and robustness. Additionally, automating the evaluation of faithfulness metrics might be another consequential pursuit, as these could provide more granular insights into the performance and reliability of summarization systems.

In summary, this research contributes meaningfully to the field of neural summarization by foregrounding the importance of faithfulness and offering an innovative, effective approach to mitigate factual inaccuracies in generated summaries. This work sets a significant precedent for the development of future abstractive summarization models that prioritize factual accuracy alongside informativeness.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Ziqiang Cao (34 papers)
  2. Furu Wei (291 papers)
  3. Wenjie Li (183 papers)
  4. Sujian Li (83 papers)
Citations (363)