Calibrated Language Models Must Hallucinate (2311.14648v3)

Published 24 Nov 2023 in cs.CL and cs.AI

Abstract: Recent LLMs generate false but plausible-sounding text with surprising frequency. Such "hallucinations" are an obstacle to the usability of language-based AI systems and can harm people who rely upon their outputs. This work shows that there is an inherent statistical lower-bound on the rate that pretrained LLMs hallucinate certain types of facts, having nothing to do with the transformer LM architecture or data quality. For "arbitrary" facts whose veracity cannot be determined from the training data, we show that hallucinations must occur at a certain rate for LLMs that satisfy a statistical calibration condition appropriate for generative LLMs. Specifically, if the maximum probability of any fact is bounded, we show that the probability of generating a hallucination is close to the fraction of facts that occur exactly once in the training data (a "Good-Turing" estimate), even assuming ideal training data without errors. One conclusion is that models pretrained to be sufficiently good predictors (i.e., calibrated) may require post-training to mitigate hallucinations on the type of arbitrary facts that tend to appear once in the training set. However, our analysis also suggests that there is no statistical reason that pretraining will lead to hallucination on facts that tend to appear more than once in the training data (like references to publications such as articles and books, whose hallucinations have been particularly notable and problematic) or on systematic facts (like arithmetic calculations). Therefore, different architectures and learning algorithms may mitigate these latter types of hallucinations.

PDF Abstract

Calibrated LLMs Must Hallucinate: An Overview

The paper "Calibrated LLMs Must Hallucinate" investigates the inherent propensity of LLMs (LMs) to generate false but plausible-sounding text. This phenomenon, termed "hallucination," is explored through a statistical lens, positing that hallucinations arise from fundamental characteristics of LMs rather than limitations of the transformer architecture or data quality.

Key Contributions

The authors present a statistical lower-bound on hallucination rates for LMs that meet a specific calibration criterion. Calibration, in this context, means that the model's output probabilities reflect actual likelihoods of events accurately. The paper argues that for "arbitrary" facts not explicitly represented in training data, hallucinations occur with a frequency close to the fraction of facts that appear only once in the training corpus. This conclusion is based on a Good-Turing estimate, a statistical tool traditionally used to estimate the probabilities of unseen events.

The research finds that mitigating hallucinations for such facts might necessitate post-training adjustments that compromise the calibration achieved during pretraining. Interestingly, the analysis suggests that systematic facts, such as well-represented publications and arithmetic truths, do not inherently suffer from hallucinations, providing an opportunity for specialized architectures or algorithms to address these cases specifically.

Implications and Speculative Developments

Statistical Necessity of Hallucinations: The assertion that even an ideally trained LM must hallucinate certain arbitrary facts challenges the expectation of complete factual accuracy in generative text. This insight compels researchers to rethink how to balance predictive performance against factual integrity.
Post-Training Strategies: The paper suggests that to reduce hallucinations, practical implementations often involve post-training intervention, which could undermine model calibration. This creates a trade-off between hallucination reduction and the retention of well-calibrated probabilities.
Focus on Systematic Facts: Since systematic facts are not inherently plagued by statistical hallucination requirements, targeted techniques may be explored to improve performance in these areas without affecting arbitrary fact hallucinations. This might involve integrating external databases or specialized reasoning modules.
Designing Adaptive Architectures: Understanding the distinct nature of hallucinations related to arbitrary versus systematic facts can inform the development of adaptive architectures that intelligently modulate the generation process based on the type of information being processed.

Numerical Insights

The paper provides concrete statistical bounds demonstrating that the generation of hallucinations is closely tied to the monofact rate—the proportion of facts observed exactly once in the training data. This relationship emphasizes the inherent tension between maintaining high calibration and minimizing hallucinations.

Future Outlook

Research in AI model factual integrity should explore novel architectures that dynamically adjust for arbitrary and systematic facts. Furthermore, this paper's findings should inspire the development of more nuanced calibration metrics that can better capture the semantic complexity of generated text.

This analysis also opens pathways for more detailed investigations into the role of prompts and conditional text generation as a means to further understand and control hallucinations in LMs.

In sum, the paper clarifies a vital aspect of LM behavior, providing statistical and theoretical foundations that explain why hallucinations are an inherent feature of achieving near-calibrated state in generative models. The insights yielded here are crucial for devising LMs that are both reliable and high-performing in diverse applications.