HDM-2: Enterprise Hallucination Detection
- HDM-2 is a modular system that detects hallucinations in LLM outputs using fine-grained context and common-knowledge verification.
- It employs a Qwen-2.5-3B-Instruct backbone augmented with LoRA fine-tuning and lightweight classification heads for precise token- and sequence-level scoring.
- The system achieves state-of-the-art F1 performance on multiple benchmarks while offering explainable, single-pass validation for efficient enterprise deployment.
HDM-2 refers to a state-of-the-art system for hallucination detection in LLMs with a focus on enterprise deployment. Developed in the context of the “HalluciNot” framework, HDM-2 is designed for fine-grained, efficient detection and annotation of context-based and common-knowledge hallucinations. It provides a modular approach that integrates context verification, common-knowledge checking, enterprise customization, and explainability, achieving leading results on multiple benchmarks (Paudel et al., 9 Apr 2025).
1. Architectural Overview
HDM-2 operates as a post-hoc validation layer over arbitrary LLMs. The input consists of a response generated by any LLM, the concatenated context (user prompt plus retrieved enterprise documents, if present), and a detection threshold . The model employs a Qwen-2.5-3B-Instruct backbone augmented via LoRA fine-tuning and a set of lightweight classification heads.
Core modules include:
- Context-based hallucination detector: A multi-task LoRA + classification head predicts global (sequence-level) and token-level scores, and , related to context agreement.
- Common-knowledge verification: A frozen, shallow classifier on intermediate hidden layers outputs for each sentence , indicating whether the statement is widely recognized as true.
- Enterprise knowledge detector (optional): A LoRA-based module trained on proprietary datasets to flag enterprise-specific but context-absent facts.
- Explanation generator: A LoRA-tuned head generates human-readable rationales for flagged spans.
The system outputs global and token-level hallucination scores, a set of flagged sentences , per-sentence common-knowledge validation, optional enterprise checks, and explanations. Modular adapters allow for component-specific loading in production, minimizing memory requirements.
2. Hallucination Taxonomy
HDM-2 formalizes four categories of LLM outputs in the enterprise setting (Paudel et al., 9 Apr 2025):
- Context-Based Hallucinations: Statements not supported or contradicted by the input context . Primary focus of HDM-2’s context module.
- Common Knowledge: Widely recognized facts not present in but expected to be known by a competent LLM. Allowed if verified by the CK head ().
- Enterprise-Specific Knowledge: Proprietary knowledge absent from both and open-domain corpora but correct within organizational context. Detected via continual pre-training on closed corpora.
- Innocuous Statements: Politeness or generic text without factual content, filtered out by rule-based heuristics to avoid false positives.
This taxonomy is embedded in the model’s operational workflow, ensuring that not only hallucinations but also permissible out-of-context facts or domain-specific statements are appropriately handled.
3. Mathematical Scoring and Sentence Selection
Hallucination detection uses explicit, multi-level scoring functions:
- Context-based scoring:
- Sequence:
- Token: , for , where is the -th token’s hidden state.
- Candidate spans: Sentences are mapped to token runs, and the set is formed via programmable aggregation (e.g., max, average, proportion-above-threshold):
- Common knowledge validation:
- , using hidden states from a specified backbone layer ( for Qwen).
Only sentences failing both context and common-knowledge checks are flagged as hallucinations.
4. Fine-Grained Annotation Mechanism
During supervised training, each response token is labeled:
- $0$: supported by the provided context,
- $1$: supported by general (common) knowledge,
- $2$: hallucinated.
The model’s token classification head is trained to optimize
Inference highlights exact spans (), facilitating audits, error analysis, and compliance. Sentence-level judgments can be modulated via aggregation hyperparameters, enabling customizable calibration of recall/precision.
5. Training Data, Optimization, and Domain Adaptation
HDM-2 utilizes:
- Backbone: Qwen-2.5-3B-Instruct.
- Datasets: HDMBench (∼50K context-doc pairs with token-level human validation across sources like RAGTruth, SQuAD, Red Pajama v2, internal tickets), RagTruth for context supervision, and True/False and TruthfulQA for common-knowledge supervision.
- Optimization: LoRA adapters plus dual classification heads, jointly minimizing
The CK head is trained separately on frozen intermediate features. The enterprise detector is realized through continued LoRA pre-training and a shallow classifier. This architecture allows incremental adaptation to new domains by fine-tuning only the knowledge classifier, not the full model.
6. Empirical Performance
HDM-2 outperforms both black-box LLM prompts and prior hallucination detectors (including SelfCheckGPT and fine-tuned Llama-2) on multiple tasks (Paudel et al., 9 Apr 2025). Notable quantitative results:
| Method | QA F1 | Data2Txt F1 | Summarization F1 | Overall F1 |
|---|---|---|---|---|
| Prompt (GPT-3.5) | 30.8 | 77.4 | 37.1 | 52.9 |
| Prompt (GPT-4) | 45.6 | 78.3 | 47.6 | 63.4 |
| SelfCheckGPT (3.5) | 43.7 | 74.8 | 40.1 | 58.8 |
| Fine-tuned Llama-2-13B | 68.2 | 88.1 | 59.1 | 78.7 |
| HDM-1 (0.5B) | 80.7 | 83.6 | 59.7 | 78.9 |
| HDM-2 (3B) | 80.6 | 88.5 | 77.7 | 85.0 |
For common-knowledge detection, HDM-2 achieves F1 scores of 83.7 (TruthfulQA) and 73.6 (HDMBench), substantially ahead of vanilla Qwen and even GPT-4o. HDM-2 is optimized for single-pass inference and real-time deployment on single-GPU racks (A100 or similar).
7. Production Characteristics and Deployment
HDM-2 is designed for enterprise standards of inference efficiency, deployment flexibility, and explainability:
- Black-box LLM compatibility: Requires only final outputs and context, not internal states of the upstream LLM.
- Low resource footprint: Modular LoRA adapters for each head allow component-specific activation without full model commitment.
- Single-pass validation: CK and context hallucination checks are performed together in one forward pass, minimizing latency.
- Industrial fine-tuning: Internal documents can be used to continually re-train only the enterprise knowledge module.
- Explainability: Produces word-level scores and sentence-level rationales, enabling human-in-the-loop calibration and robust audit for risk and compliance.
A plausible implication is that HDM-2’s design represents a convergence of fine-grained, explainable error detection with enterprise operational constraints in LLM deployment. It establishes new state-of-the-art on established hallucination benchmarks while maintaining a tractable parameter count and rapid inference, providing a practical standard for enterprise hallucination monitoring (Paudel et al., 9 Apr 2025).