SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency

Published 3 Nov 2023 in cs.CL | (2311.01740v2)

Abstract: Hallucination detection is a critical step toward understanding the trustworthiness of modern LMs. To achieve this goal, we re-examine existing detection approaches based on the self-consistency of LMs and uncover two types of hallucinations resulting from 1) question-level and 2) model-level, which cannot be effectively identified through self-consistency check alone. Building upon this discovery, we propose a novel sampling-based method, i.e., semantic-aware cross-check consistency (SAC3) that expands on the principle of self-consistency checking. Our SAC3 approach incorporates additional mechanisms to detect both question-level and model-level hallucinations by leveraging advances including semantically equivalent question perturbation and cross-model response consistency checking. Through extensive and systematic empirical analysis, we demonstrate that SAC3 outperforms the state of the art in detecting both non-factual and factual statements across multiple question-answering and open-domain generation benchmarks.

Abstract PDF Upgrade to Chat

Citations (39)

View on Semantic Scholar

Summary

The paper demonstrates that self-consistency alone fails to ensure factual accuracy and introduces SAC3, which integrates semantic perturbations and cross-model evaluations.
The SAC3 framework significantly improves hallucination detection, achieving a 99.4% AUROC in classification tasks and over 7% improvement in open-domain generation tasks.
The method incurs moderate computational overhead but leverages parallel checking strategies to maintain efficiency in high-stakes, fact-sensitive applications.

SAC3: Reliable Hallucination Detection in Black-Box LLMs

Introduction

The paper "SAC3: Reliable Hallucination Detection in Black-Box LLMs via Semantic-aware Cross-check Consistency" (2311.01740) addresses the critical issue of hallucination in LMs. Hallucinations, or confidently incorrect predictions made by LMs, hinder their reliability across applications where factual accuracy is paramount. The paper critiques self-consistency methods for hallucination detection and introduces a new paradigm, semantic-aware cross-check consistency (SAC $^3$ ), which extends self-consistency checks with semantic perturbation and cross-model evaluation techniques.

Limitations of Self-Consistency

Current approaches to hallucination detection often rely on self-consistency checks, which assume LMs produce consistent outputs only for factual information. However, this paper demonstrates that consistency does not equal factuality (Figure 1). Two phenomena explain this: question-level hallucinations, where incorrect answers are consistently generated due to the phrasing of the question, and model-level hallucinations, where different LMs yield varied outputs for the same query. These insights underscore the inadequacy of relying entirely on self-consistency for factual assessment.

Figure 1: Key observation: solely checking the self-consistency of LLMs is not sufficient for deciding factuality. Left: generated responses to the same question may be consistent but non-factual. Right: generated responses may be inconsistent with the original answer that is factually correct.

SAC $^3$ Methodology

The SAC $^3$ method enhances hallucination detection through two primary modules: question-level cross-checking and model-level cross-checking (Figure 2).

Semantic Question Perturbation: SAC $^3$ generates semantically equivalent variations of a given question to assess the response consistency across these variations. This module mitigates question-level hallucinations by identifying consistent yet incorrect responses and suggests rephrasing critical queries to verify factuality.
Cross-Model Consistency Check: This component involves evaluating the responses from multiple LMs to detect inconsistencies. Discrepancies between LMs help identify model-specific hallucination tendencies, leveraging smaller models' occasional factual correctness over larger models' hallucinations.
Figure 2: Overview of the proposed semantic-aware cross-check consistency (SAC³⁾ method.

Empirical Evaluation

SAC $^3$ was evaluated across multiple QA datasets, outperforming existing self-consistency frameworks significantly. In classification QA tasks, SAC $^3$ achieved an AUROC of 99.4%, markedly superior to self-consistency baselines which lingered below 70%. The model also demonstrated robustness in imbalanced datasets, maintaining high detection accuracy even when samples were preponderantly hallucinatory (Figure 3, Figure 4).

Figure 3: Impact of threshold on detection accuracy.

Figure 4: Histogram of hallucination score.

For open-domain generation tasks, SAC $^3$ improved detection rates by over 7%, indicating its substantial efficacy in diverse linguistic contexts. Its performance is notable considering the greater complexity and ambiguity inherent to these tasks compared to structured classification scenarios.

Computational Considerations

SAC $^3$ introduces a moderate computational overhead compared to self-consistency checks due to additional semantic perturbations and cross-model evaluations. However, this overhead is counterbalanced by its ability to execute parallel checks and optimized prompting strategies that reduce latency and cost. The method's robustness and accuracy justify the additional computations, especially in high-stakes applications where misinformation poses significant risks.

Conclusion

The SAC $^3$ framework represents a significant advancement in hallucination detection for black-box LMs. By integrating semantic cross-checks and leveraging model diversity, it addresses the shortcomings of previous self-consistency approaches. As LMs continue to pervade critical domains, methods such as SAC $^3$ that enhance their reliability will become indispensable. Future work may explore integrating SAC $^3$ with adaptive sampling strategies and extending its applicability to broader NLP tasks.