Triggering Hallucinations in LLMs: A Quantitative Study of Prompt-Induced Hallucination in Large Language Models (2505.00557v1)

Published 1 May 2025 in cs.CL and cs.AI

Abstract: Hallucinations in LLMs present a growing challenge across real-world applications, from healthcare to law, where factual reliability is essential. Despite advances in alignment and instruction tuning, LLMs can still generate outputs that are fluent yet fundamentally untrue. Understanding the cognitive dynamics that underlie these hallucinations remains an open problem. In this study, we propose a prompt-based framework to systematically trigger and quantify hallucination: a Hallucination-Inducing Prompt (HIP), which synthetically fuses semantically distant concepts (e.g., periodic table of elements and tarot divination) in a misleading way, and a Hallucination Quantifying Prompt (HQP), which scores the plausibility, confidence, and coherence of the output. Controlled experiments across multiple LLMs revealed that HIPs consistently produced less coherent and more hallucinated responses than their null-fusion controls. These effects varied across models, with reasoning-oriented LLMs showing distinct profiles from general-purpose ones. Our framework provides a reproducible testbed for studying hallucination vulnerability, and opens the door to developing safer, more introspective LLMs that can detect and self-regulate the onset of conceptual instability.

Summary

Prompt-Induced Hallucination in LLMs: Analysis and Implications

The paper "Triggering Hallucinations in LLMs: A Quantitative Study of Prompt-Induced Hallucination in LLMs" by Makoto Sato provides a comprehensive investigation into the phenomenon of hallucinations in LLMs. It presents a novel methodological approach to induce and quantify hallucinations systematically, thereby offering insights into the cognitive dynamics and vulnerabilities within LLM architectures.

Overview of Methodology

The paper introduces the concept of Hallucination-Inducing Prompts (HIPs) designed to create hallucinated outputs by synthetically fusing semantically distant concepts such as "periodic table of elements" and "tarot divination". These prompts challenge LLMs to reconcile incompatible domains, leading to a breakdown in coherence and factual accuracy. Additionally, the paper uses Hallucination Quantifying Prompts (HQPs) to assess the plausibility, confidence, and coherence of the outputs generated by these HIPs.

The experimental framework involves controlled comparisons between HIPs and null-fusion control prompts (HIPn), across multiple LLMs. Models are categorized into reasoning-oriented and general-purpose types. The methodology employs a Disposable Session Method to ensure repeatability by applying each prompt in stateless sessions, thus eliminating session history effects.

Key Findings and Analysis

Hallucination Susceptibility Across Models

The paper finds significant variability in hallucination susceptibility among different LLM architectures. Reasoning-oriented models (such as ChatGPT-o3 and Gemini2.5Pro) exhibited lower scores compared to general-purpose models like DeepSeek. Notably, DeepSeek and DeepSeek-R1 models demonstrated significantly higher hallucination scores, underscoring architectural differences that may influence susceptibility.

Semantic Fusion and Logical Coherence

The research underscores that hallucination is not merely triggered by the presence of multiple concepts within a prompt, but rather by the unnatural fusion of incompatible ideas. HIPc prompts result in higher hallucination scores compared to HIPn prompts, confirming the role of semantic fusion as a catalyst for hallucination.

Interestingly, prompts that maintain logical and technical consistency (TIPcs) demonstrated even lower scores than HIPn prompts, suggesting that logically grounded conceptual blends stabilize cognitive coherence in LLMs.

Implications and Future Directions

The findings highlight several implications for the development of LLMs and AI safety:

Prompt Engineering: The paper illustrates prompt design as a diagnostic tool to probe model vulnerabilities. It suggests that designing prompts to test semantic coherence and detect epistemic instability could become a standard protocol for LLM evaluation and alignment.
Model-Specific Features: The varying susceptibility between model types implies that hallucination resistance may be rooted in architecture-specific features beyond instruction tuning.
Cognitive Vulnerability: The notion of Prompt-Induced Hallucination (PIH) reveals a distinct failure mode in LLMs, where the system's inability to evaluate the semantic legitimacy of fused concepts leads to flawed reasoning.
Theoretical Insights: By foregrounding the conceptual conflict driving hallucinations, the research opens avenues for further exploration into representational conflicts and their resolution mechanisms in artificial cognition.

Future research could delve into internal signal analysis, capturing metrics like semantic entropy or attention patterns, to correlate these with hallucination scores. This introspective analysis could reveal deeper principles governing information processing in LLMs. Additionally, exploring automation in the HIP/HQP framework could facilitate large-scale profiling and enhance the robustness of hallucination diagnostics.

Overall, this paper provides a crucial foundation for understanding and mitigating hallucinations in LLMs, paving the way for safer deployment and the potential development of introspective, self-regulating LLMs.

Related Papers

YouTube

Show All Videos