Contrastive Error Attribution for Finetuned Language Models (2212.10722v2)

Published 21 Dec 2022 in cs.CL

Abstract: Recent work has identified noisy and misannotated data as a core cause of hallucinations and unfaithful outputs in Natural Language Generation (NLG) tasks. Consequently, identifying and removing these examples is a key open challenge in creating reliable NLG systems. In this work, we introduce a framework to identify and remove low-quality training instances that lead to undesirable outputs, such as faithfulness errors in text summarization. We show that existing approaches for error tracing, such as gradient-based influence measures, do not perform reliably for detecting faithfulness errors in NLG datasets. We overcome the drawbacks of existing error tracing methods through a new, contrast-based estimate that compares undesired generations to human-corrected outputs. Our proposed method can achieve a mean average precision of 0.93 at detecting known data errors across synthetic tasks with known ground truth, substantially outperforming existing approaches. Using this approach and re-training models on cleaned data leads to a 70% reduction in entity hallucinations on the NYT dataset and a 55% reduction in semantic errors on the E2E dataset.

References (34)

Authors (3)

Faisal Ladhak (31 papers)
Esin Durmus (38 papers)
Tatsunori Hashimoto (80 papers)

Citations (7)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Contrastive Error Attribution for Finetuned Language Models (2212.10722v2)

Summary

Related Papers