Fine-grained Hallucination Detection and Editing for Language Models (2401.06855v4)

Published 12 Jan 2024 in cs.CL

Abstract: LLMs (LMs) are prone to generate factual errors, which are often called hallucinations. In this paper, we introduce a comprehensive taxonomy of hallucinations and argue that hallucinations manifest in diverse forms, each requiring varying degrees of careful assessments to verify factuality. We propose a novel task of automatic fine-grained hallucination detection and construct a new evaluation benchmark, FavaBench, that includes about one thousand fine-grained human judgments on three LM outputs across various domains. Our analysis reveals that ChatGPT and Llama2-Chat (70B, 7B) exhibit diverse types of hallucinations in the majority of their outputs in information-seeking scenarios. We train FAVA, a retrieval-augmented LM by carefully creating synthetic data to detect and correct fine-grained hallucinations. On our benchmark, our automatic and human evaluations show that FAVA significantly outperforms ChatGPT and GPT-4 on fine-grained hallucination detection, and edits suggested by FAVA improve the factuality of LM-generated text.

PDF HTML Abstract

Introduction

LLMs (LMs) have become quite adept at generating fluent and coherent language. Despite the apparent progress, these models exhibit a critical drawback: they often produce text containing factually incorrect information, known as "hallucinations." The research community has addressed the phenomenon of hallucinations by developing detection and correction mechanisms. However, these tend to be coarse-grained, simplifying the problem into binary categories of either factual or non-factual. Recognizing the limitations of current systems, researchers recently stepped forward with an innovative approach to tackle the issue of hallucinations with greater precision.

Taxonomy of Hallucinations

For a nuanced understanding of hallucinations, a novel taxonomy has been proposed, which categorizes factual errors in LM generations into six distinct types. This taxonomy is particularly of interest in scenarios where responses must be grounded in world knowledge. The six detailed categories include commonly recognized entity-level errors but also highlight underexplored areas like unverifiable statements and invented concepts. The proposed taxonomy includes contradictory entity and relation errors, entire statements that directly contradict known facts, fabrications about non-existent entities or concepts, personal biases disguised as facts, and statements that cannot be verified against world knowledge.

Fine-grained Hallucination Detection

The researchers designed a task to accompany their taxonomy—a task that centers on detecting the specific hallucination type for a given factual error in a LLM's response. To accomplish this, a fine-grained hallucination detection benchmark was constructed, featuring human-annotated responses from widely-used models across different domains. Analyses reveal that both ChatGPT and Llama2-Chat exhibit high rates of hallucinations, emphasizing the urgent need for sophisticated detection methods.

The Fava Model

To address the challenge, the researchers introduced Fava, a retrieval-augmented LLM. Unlike its predecessors, Fava is trained on synthetic data specifically designed to reflect the nuanced taxonomy. It not only detects hallucinations but also suggests corrections at a fine-grained level. Results have shown Fava to be significantly more effective at detecting and editing hallucinations compared to existing systems like ChatGPT. Despite this leap in performance, the researchers acknowledge there is still considerable room for improvement in this area of LLM development.

Fava represents a strategic advance in the endeavor to enhance the reliability and factuality of LLM outputs, marking a milestone in the ongoing development of AI-driven natural language processing. The supporting materials, including code, data, and a demonstration, have been made available for interested parties.

Discoveries in AI continue to unfold, and with them, the tools and methodologies to refine these powerful systems evolve as well. The emergence of fine-grained hallucination detection not only enhances current applications but also opens new avenues for future AI deployments in fields where factual accuracy is paramount.