Do Language Models Know When They're Hallucinating References? (2305.18248v3)

Published 29 May 2023 in cs.CL and cs.AI

Abstract: State-of-the-art LLMs (LMs) are notoriously susceptible to generating hallucinated information. Such inaccurate outputs not only undermine the reliability of these models but also limit their use and raise serious concerns about misinformation and propaganda. In this work, we focus on hallucinated book and article references and present them as the "model organism" of LLM hallucination research, due to their frequent and easy-to-discern nature. We posit that if a LLM cites a particular reference in its output, then it should ideally possess sufficient information about its authors and content, among other relevant details. Using this basic insight, we illustrate that one can identify hallucinated references without ever consulting any external resources, by asking a set of direct or indirect queries to the LLM about the references. These queries can be considered as "consistency checks." Our findings highlight that while LMs, including GPT-4, often produce inconsistent author lists for hallucinated references, they also often accurately recall the authors of real references. In this sense, the LM can be said to "know" when it is hallucinating references. Furthermore, these findings show how hallucinated references can be dissected to shed light on their nature. Replication code and results can be found at https://github.com/microsoft/hallucinated-references.

Authors (4)

Ayush Agrawal (17 papers)
Mirac Suzgun (23 papers)
Lester Mackey (79 papers)
Adam Tauman Kalai (37 papers)

Citations (78)

View on Semantic Scholar

Summary

Exploring Hallucinations in LLMs Through the Lens of Fabricated References

Introduction

LLMs (LMs), particularly in their application within generative tasks, have increasingly exhibited a tendency to produce so-called "hallucinated" content. These hallucinations, which encompass the generation of non-existent references, pose substantial reliability issues, impacting the utility and trustworthiness of LM outputs. This paper explores the phenomenon, specifically examining the generation of hallucinated article and book titles by state-of-the-art LMs. Through a method comprising direct and indirect queries to the models themselves, the research assesses the ability of LMs to self-diagnose these hallucinations without external verification.

Hallucination Problem in LMs

The investigation identifies fabricated references as a significant aspect of hallucination in LMs, emphasizing not only the prevalence of such occurrences but also their potential to mislead or harm. These hallucinations are differentiated as open-domain, lacking grounding in the models' training data, which contrasts with closed-domain hallucinations defined by misinformation related to specific documents. Moreover, the paper articulates a distinction between groundedness and correctness, positioning hallucinations as ungrounded, fabricated content regardless of their factual accuracy.

Methodology

A notable contribution of this work is the innovative methodology designed to evaluate the propensity for and detection of hallucinated references. Utilizing a blend of direct queries (e.g., directly questioning the existence of a fabricated reference) and indirect queries (e.g., requesting details such as authorship of a presumed reference), the paper achieves a multifaceted approach to understanding LMs' self-awareness of hallucination. This methodology is underscored by a reliance on approximate evaluation via search engine results due to the inaccessibility of exact LM training data.

Direct Queries: These involve straightforward inquiries into a reference's existence, with variations to mitigate potential biases in the LMs' responses.

Indirect Queries: A novel addition, these queries aim to elucidate inconsistencies in the details provided by LMs about nonexistent references, hypothesizing a lower likelihood of consistent fabrications across queries.

The investigation then employs these methods across several LMs to identify whether references generated by these models are grounded or hallucinated, using exact match queries on a search engine as a proxy for verification.

Findings and Implications

The paper's quantitative analysis reveals a variance in hallucination rates across different models, with newer models displaying a reduced propensity for fabrication. Both direct and indirect querying methods proved effective in detecting hallucinations, with combined approaches (ensemble of direct and indirect queries) enhancing the detection accuracy. This highlights not only the internal capabilities of LMs to identify their hallucinations but also suggests that the phenomenon is primarily a generation issue rather than inherent to the models' training or representation.

Future Directions

This research lays a foundational framework from which future investigations could spring, particularly in refining query methods or exploring hallucination in other generative applications beyond fabricated references. It posits the reduction of hallucinations as a generative issue, calling for advancements in generation protocols rather than focusing solely on training data or model architecture revisions.

Conclusion

By meticulously cataloging and analyzing the occurrence of hallucinated references within LMs and introducing a method for internal detection of such fabrications, this paper contributes significantly to the understanding of LMs' reliability and integrity. It represents a crucial step toward mitigating the impact of hallucinations on the practical deployment of LMs, with implications for the development of more trustworthy and robust generative models.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/stanfordnlp/status/1894466877478637687

https://twitter.com/anmorgan2414/status/1897693818608050573

YouTube

Show All Videos