LLM-Induced Hallucinated Citations
- LLM-induced hallucinated citations are fabricated bibliographic references generated by large language models, categorized into total fabrication, partial attribute corruption, identifier hijacking, semantic, and placeholder hallucinations.
- Quantitative audits reveal that hallucination rates range from 0.21% to 1.91% across platforms such as arXiv, bioRxiv, SSRN, and PubMed Central, posing significant challenges to scholarly communication.
- Mitigation strategies such as retrieval-augmented generation, automated verification pipelines, and editorial integration are critical for reducing the propagation of these phantom citations.
LLM-induced hallucinated citations denote bibliographic references fabricated by LLMs that do not correspond to authentic or verifiable publications. These phantom citations manifest when LLMs are prompted to supply references, generating plausible-looking metadata for non-existent works—a phenomenon increasingly pervasive as LLMs are deployed in academic writing, preprint generation, and citation recommendation. The magnitude and systemic impact of such hallucinations now extend beyond isolated anecdote, disrupting scholarly communication and bibliometric integrity at field-wide scale.
1. Definition and Taxonomy of Hallucinated Citations
A hallucinated citation is any LLM-generated bibliographic record that either fully or partially refers to a non-existent publication. More formally, hallucinated citations are characterized by their primary mode of failure (Ansari, 5 Feb 2026):
- Total Fabrication (TF): All metadata fields (author, title, venue, identifier) are invented.
- Partial Attribute Corruption (PAC): Some fields (e.g., authors or titles) map to real works, but the rest are spurious.
- Identifier Hijacking (IH): Valid identifiers (e.g., DOI, arXiv ID) are present but point to a different publication.
- Semantic Hallucination (SH): Title is semantically plausible for the domain but not present in any real record.
- Placeholder Hallucination (PH): Unfilled template tokens (e.g., “Firstname Lastname”).
Compound modes are prevalent, with 100% of hallucinations in elite peer review settings exhibiting both a primary and secondary deception layer (e.g., fabricated metadata with a valid identifier, boosting false verifiability).
2. Quantitative Prevalence and Real-World Impact
Recent large-scale audits expose the population-level scale of LLM-induced hallucinated citations:
| Corpus | Hallucination Rate (%) | Monthly Hallucinated Citations (Aug 2025) |
|---|---|---|
| arXiv | 0.39 | 3,353 |
| bioRxiv | 0.21 | 478 |
| SSRN | 1.91 | 767 |
| PubMed Central | 0.27 | 8,140 |
Summed annually, this amounts to at least 146,932 hallucinated citations in 2025 across these major preprint and journal platforms, with unmatched reference titles serving as conservative upper bound estimates following rigorous multi-stage database, string similarity, and LLM-aided matching pipelines (Zhao et al., 8 May 2026). Rates are highest in social sciences and rapidly digitizing fields (e.g., SSRN ≈1.91 %, arXiv subfields ≈0.60 %).
Hallucinated references are diffusely embedded, appearing in papers from authors at all career stages and institutions but with higher rates in manuscripts with linguistic signatures of AI assistance, smaller teams, and early-career author teams. Notably, 85% of hallucinated citations present in preprints remain in their published versions after peer review, indicating that conventional moderation and editorial safeguards detect only a minority (Zhao et al., 8 May 2026).
3. Causes: Training Data Redundancy and Prompt Conditioning
The propensity to hallucinate citations is not uniform, varying systematically with model architecture, training data redundancy, and prompt conditions:
- Citation Frequency Thresholds: Both empirical audits (Niimi, 29 Oct 2025, Niimi, 12 Nov 2025) demonstrate that LLMs tend to memorize metadata for highly cited papers (≥1,000 citations), with factual consistency (as measured by Sentence-BERT cosine similarity) saturating near 1.0 once log(citation count) ≈7. Below ~100 citations, LLM outputs are largely generative, yielding higher hallucination rates; above this threshold, recall accuracy increases rapidly.
- Prompt-Induced Hallucination: Citation fabrication is prompt-induced rather than intrinsic—when LLMs are not explicitly asked to provide references, spontaneous hallucination rates drop to zero (Naser, 7 Feb 2026).
- Domain and Framing Effects: Hallucination rates vary by discipline and prompt framing (recent vs. seminal works), with recent/obscure topics producing higher error rates (74.1% under “recent” framing vs. 55.0% for “seminal”) (Naser, 7 Feb 2026).
Model capacity and training data overlap mediate these effects. Higher-capacity models show lower hallucination rates, but newer model versions do not always reduce the phenomenon, emphasizing the role of dataset curation and alignment (Naser, 7 Feb 2026).
4. Detection and Verification Methodologies
State-of-the-art detection and verification systems integrate multi-stage, multi-source checks:
- Database Verification: Pipelines like CheckIfExist and CITEAudit decompose citations into structured fields and validate existence, semantic alignment, and identifier consistency against CrossRef, Semantic Scholar, OpenAlex, and targeted scholar searches (Abbonato, 27 Jan 2026, Yuan et al., 26 Feb 2026).
- String Similarity and Confidence Scoring: Core textual similarity is measured via normalized Levenshtein distance, Jaro-Winkler, field-wise fuzzy matching, and composite weighted scores with thresholds for acceptance (commonly ≥0.85 for “existing”) (Abbonato, 27 Jan 2026, Naser, 7 Feb 2026, Yuan et al., 26 Feb 2026).
- Agentic Multi-Source Cascades: Agent-based pipelines segment tasks into claim extraction, memory caching, web retrieval, and expert judgment, achieving F1 scores exceeding 0.9 and supporting integration into editorial workflows (Yuan et al., 26 Feb 2026).
- Classifier Filters: Lightweight bibliographic string classifiers using surface features (title length, DOI format, “et al.” frequency) pre-screen plausible fabrications with AUC up to 0.876 (Naser, 7 Feb 2026).
- Network and Graph-Based Stress Testing: Failure is often structural, with LLMs producing citation graphs that omit >90% of canonical links (citation omission rate O≈91.9%) and exhibit node Jaccard similarity of only ≈0.028 against ground truth (Boudourides, 2 Mar 2026). Only network-level validation can reveal such systematic distortions.
5. Consequences for Peer Review, Knowledge Equity, and Bibliometrics
LLM-induced citation hallucinations have widespread epistemic and sociometric consequences:
- Peer Review Vulnerabilities: Even premier conference panels fail to detect hallucinated citations, as fabricated records exploit reviewer heuristics: plausible titles, working DOIs, known author/venue patterns. Compound (layered) hallucinations pass superficial detection; 100% of such fake references in NeurIPS 2025 escaped detection (Ansari, 5 Feb 2026).
- Bibliometric Distortions: Hallucinated citations disproportionately assign credit to already prominent and male scholars, amplifying systemic inequities in academic recognition. Solo-authored and junior-authored papers are most likely to propagate hallucinated references (Zhao et al., 8 May 2026).
- Bibliographic Database Pollution: Hallucinated titles are appearing in bibliometric indices (e.g., Google Scholar “citation-only” entries), establishing phantom chains of attribution and polluting metadata (Zhao et al., 8 May 2026).
- Structural Knowledge Drift: At the network level, LLM-generated bibliographies can fundamentally alter the inferred structure of a field, creating or omitting central nodes and re-centering intellectual influence without explicit trace (Boudourides, 2 Mar 2026).
6. Mitigation Strategies and System-Level Countermeasures
Effective suppression of citation hallucinations now relies on architectural, algorithmic, and process interventions:
- Automated Verification at Submission: Four-stage pipelines parse references, check existence and metadata consistency, validate identifiers, and optionally test semantic plausibility—raising detection recall to ≈0.99 for synthetic hallucinations (Ansari, 5 Feb 2026).
- Retrieval-Augmented Generation (RAG): Citation grounding via live database injection, as recommended in hybrid systems and post-hoc citation-enhancement frameworks (CEG), ablates hallucination by ensuring all references are verifiable and linked to context (Li et al., 2024, Arafat, 13 Dec 2025).
- Explicit Model Constraints: Training LLMs to emit “I don’t know” or refusing to complete references absent high-confidence matches dramatically limits fabrication (Niimi, 29 Oct 2025, Niimi, 12 Nov 2025).
- Consensus and Repetition Heuristics: Accepting only citations produced by multiple independent models or repeated within the same prompt raises provenance accuracy to ≥95% (Naser, 7 Feb 2026).
- Fine-grained Citation Training: Sentence-level citation-aware SFT, as in LongCite-8B/9B, achieves higher citation F1 and suppresses hallucinated statements across long-context QA (Zhang et al., 2024).
- Internal State Monitoring: Field-specific neuron interventions and hidden-state clustering allow online detection and selective suppression of hallucination-prone activations within the transformer model itself (Chen et al., 20 Apr 2026, Mao et al., 18 Jan 2026).
- Editorial Integration and Audit Tools: Pre-submission audits and reviewer-assistive dashboards (e.g., CiteAudit) flag suspect citations for correction, while journal intake systems can automatically validate all references, relegating unresolved cases to human review (Yuan et al., 26 Feb 2026).
7. Limitations and Open Directions
Current verification pipelines may miss niche, non-DOI, or non-English publications, and methods sensitive only to non-existent titles cannot address the broader problem of misplaced or misattributed citations. Further, some systems have not yet generalized performance metrics across diverse corpora or publication cultures, and performance for internal-model neuron suppression remains underexplored in general usage (Zhao et al., 8 May 2026, Chen et al., 20 Apr 2026). Long-term mitigation will require sustained development of retrieval-grounded LLM architectures, integration of network-level validation, transparent provenance tracking, and scalable human-in-the-loop auditing for persistent edge cases.
In summary, LLM-induced hallucinated citations are a quantitatively significant, structurally complex, and sociometrically non-random threat to scientific integrity, arising from both limits in LLM pretraining data and process weaknesses in scholarly production. Systemic countermeasures—combining real-time reference validation, retrieval augmentation, internal signal monitoring, and policy reforms—are necessary to contain their propagation in the academic literature (Zhao et al., 8 May 2026, Niimi, 29 Oct 2025, Naser, 7 Feb 2026, Ansari, 5 Feb 2026, Yuan et al., 26 Feb 2026, Arafat, 13 Dec 2025, Zhang et al., 2024).