AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples Generation (2311.09521v3)

Published 16 Nov 2023 in cs.CL

Abstract: Ensuring factual consistency is crucial for natural language generation tasks, particularly in abstractive summarization, where preserving the integrity of information is paramount. Prior works on evaluating factual consistency of summarization often take the entailment-based approaches that first generate perturbed (factual inconsistent) summaries and then train a classifier on the generated data to detect the factually inconsistencies during testing time. However, previous approaches generating perturbed summaries are either of low coherence or lack error-type coverage. To address these issues, we propose AMRFact, a framework that generates perturbed summaries using Abstract Meaning Representations (AMRs). Our approach parses factually consistent summaries into AMR graphs and injects controlled factual inconsistencies to create negative examples, allowing for coherent factually inconsistent summaries to be generated with high error-type coverage. Additionally, we present a data selection module NegFilter based on natural language inference and BARTScore to ensure the quality of the generated negative samples. Experimental results demonstrate our approach significantly outperforms previous systems on the AggreFact-SOTA benchmark, showcasing its efficacy in evaluating factuality of abstractive summarization.

PDF Abstract

AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Sample Generation

"AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples Generation," introduces a sophisticated framework leveraging Abstract Meaning Representations (AMRs) to address specific shortcomings in factuality evaluation of abstractive summarization systems. This research targets a pervasive issue in abstractive summarization: the generation of factually inconsistent summaries. Prior methods commonly employ entailment-based approaches, generating perturbed summaries that frequently lack coherence or do not adequately cover various types of factual errors. AMRFact posits a novel solution by utilizing AMR-based perturbations to improve the quality and error-type coverage of factually inconsistent summary generation.

Methodology

AMRFact employs a systematic approach for generating negative samples with a focus on high coherence and comprehensive error-type coverage:

AMR Parsing and Manipulation: The process begins with parsing factually consistent summaries into AMR graphs. These graphs are then perturbed with controlled factual inconsistencies to produce negative examples.
Negative Sample Generation: Through semantic-level perturbations, the framework creates coherent factually inconsistent summaries. This is achieved without sacrificing error-type coverage, a common issue in previous string-replacement-based methods.
Filtering with NegFilter: The innovation of NegFilter represents a pivotal step in ensuring the validity and quality of generated negative samples. It applies natural language inference and BARTScore evaluations to filter out samples that do not meet specific criteria for being considered valid negative examples.
Model Training: A RoBERTa-based model is fine-tuned on the balanced dataset of positive and filtered negative samples to evaluate the factuality of summaries.

Experimental Results

The researchers evaluated AMRFact using the AggreFact-FtSota benchmark, demonstrating substantial improvements over existing systems. Specifically, AMRFact achieved state-of-the-art performance on the CNN/Daily Mail split, with a balanced accuracy significantly outperforming prior systems by 2.1%. The experiments underscore the efficacy of AMRFact in navigating the complexities of factual inconsistency detection across various summarization systems.

Implications and Future Directions

AMRFact's contribution is significant both practically and theoretically. Practically, it provides a more reliable framework for generating synthetic data used to train factuality evaluators, potentially reducing the prevalence of factually inconsistent summaries in real-world applications. Theoretically, the adoption of AMR demonstrates the utility of graph-based semantic representations in tackling complex natural language understanding tasks, reinforcing the importance of semantic abstraction in enhancing machine understanding.

Future research could focus on expanding the scope of AMRFact to include multilingual datasets, further validating its applicability across diverse linguistic contexts. Additionally, integrating AMR-based frameworks with LLMs could enhance the robustness and factual alignment of generated content.

Conclusion

This paper presents a meticulous approach to enhancing summarization factuality evaluation through AMR-driven negative samples. By addressing critical issues of coherence and error-type coverage in existing systems, AMRFact establishes itself as a leading method for developing sophisticated factuality evaluation metrics, paving the way for future advancements in the field of natural language generation and understanding.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Haoyi Qiu (10 papers)
Kung-Hsiang Huang (22 papers)
Jingnong Qu (2 papers)
Nanyun Peng (205 papers)

Citations (5)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/HaoyiQiu/status/1775644073526976929

https://twitter.com/knishimae0531/status/1775674702067028357