HDM-2: Enterprise Hallucination Detection

Updated 19 March 2026

HDM-2 is a modular system that detects hallucinations in LLM outputs using fine-grained context and common-knowledge verification.
It employs a Qwen-2.5-3B-Instruct backbone augmented with LoRA fine-tuning and lightweight classification heads for precise token- and sequence-level scoring.
The system achieves state-of-the-art F1 performance on multiple benchmarks while offering explainable, single-pass validation for efficient enterprise deployment.

HDM-2 refers to a state-of-the-art system for hallucination detection in LLMs with a focus on enterprise deployment. Developed in the context of the “HalluciNot” framework, HDM-2 is designed for fine-grained, efficient detection and annotation of context-based and common-knowledge hallucinations. It provides a modular approach that integrates context verification, common-knowledge checking, enterprise customization, and explainability, achieving leading results on multiple benchmarks (Paudel et al., 9 Apr 2025).

1. Architectural Overview

HDM-2 operates as a post-hoc validation layer over arbitrary LLMs. The input consists of a response $r$ generated by any LLM, the concatenated context $c$ (user prompt plus retrieved enterprise documents, if present), and a detection threshold $t \in [0,1]$ . The model employs a Qwen-2.5-3B-Instruct backbone augmented via LoRA fine-tuning and a set of lightweight classification heads.

Core modules include:

Context-based hallucination detector: A multi-task LoRA + classification head predicts global (sequence-level) and token-level scores, $h_s(c,r)$ and $\mathbf{h}_w(c,r)$ , related to context agreement.
Common-knowledge verification: A frozen, shallow classifier on intermediate hidden layers outputs $h_k(s_j)$ for each sentence $s_j$ , indicating whether the statement is widely recognized as true.
Enterprise knowledge detector (optional): A LoRA-based module trained on proprietary datasets to flag enterprise-specific but context-absent facts.
Explanation generator: A LoRA-tuned head generates human-readable rationales for flagged spans.

The system outputs global and token-level hallucination scores, a set of flagged sentences $G$ , per-sentence common-knowledge validation, optional enterprise checks, and explanations. Modular adapters allow for component-specific loading in production, minimizing memory requirements.

2. Hallucination Taxonomy

HDM-2 formalizes four categories of LLM outputs in the enterprise setting (Paudel et al., 9 Apr 2025):

Context-Based Hallucinations: Statements not supported or contradicted by the input context $c$ . Primary focus of HDM-2’s context module.
Common Knowledge: Widely recognized facts not present in $c$ but expected to be known by a competent LLM. Allowed if verified by the CK head ( $c$ 0).
Enterprise-Specific Knowledge: Proprietary knowledge absent from both $c$ 1 and open-domain corpora but correct within organizational context. Detected via continual pre-training on closed corpora.
Innocuous Statements: Politeness or generic text without factual content, filtered out by rule-based heuristics to avoid false positives.

This taxonomy is embedded in the model’s operational workflow, ensuring that not only hallucinations but also permissible out-of-context facts or domain-specific statements are appropriately handled.

3. Mathematical Scoring and Sentence Selection

Hallucination detection uses explicit, multi-level scoring functions:

Context-based scoring:
- Sequence: $c$ 2
- Token: $c$ 3, for $c$ 4, where $c$ 5 is the $c$ 6-th token’s hidden state.
Candidate spans: Sentences $c$ 7 are mapped to token runs, and the set $c$ 8 is formed via programmable aggregation (e.g., max, average, proportion-above-threshold):

$c$ 9

Common knowledge validation:
- $t \in [0,1]$ 0, using hidden states from a specified backbone layer ( $t \in [0,1]$ 1 for Qwen).

Only sentences failing both context and common-knowledge checks are flagged as hallucinations.

4. Fine-Grained Annotation Mechanism

During supervised training, each response token is labeled:

$t \in [0,1]$ 2: supported by the provided context,
$t \in [0,1]$ 3: supported by general (common) knowledge,
$t \in [0,1]$ 4: hallucinated.

The model’s token classification head is trained to optimize

$t \in [0,1]$ 5

Inference highlights exact spans ( $t \in [0,1]$ 6), facilitating audits, error analysis, and compliance. Sentence-level judgments can be modulated via aggregation hyperparameters, enabling customizable calibration of recall/precision.

5. Training Data, Optimization, and Domain Adaptation

HDM-2 utilizes:

Backbone: Qwen-2.5-3B-Instruct.
Datasets: HDMBench (∼50K context-doc pairs with token-level human validation across sources like RAGTruth, SQuAD, Red Pajama v2, internal tickets), RagTruth for context supervision, and True/False and TruthfulQA for common-knowledge supervision.
Optimization: LoRA adapters plus dual classification heads, jointly minimizing

$t \in [0,1]$ 7

The CK head is trained separately on frozen intermediate features. The enterprise detector is realized through continued LoRA pre-training and a shallow classifier. This architecture allows incremental adaptation to new domains by fine-tuning only the knowledge classifier, not the full model.

6. Empirical Performance

HDM-2 outperforms both black-box LLM prompts and prior hallucination detectors (including SelfCheckGPT and fine-tuned Llama-2) on multiple tasks (Paudel et al., 9 Apr 2025). Notable quantitative results:

Method	QA F1	Data2Txt F1	Summarization F1	Overall F1
Prompt (GPT-3.5)	30.8	77.4	37.1	52.9
Prompt (GPT-4)	45.6	78.3	47.6	63.4
SelfCheckGPT (3.5)	43.7	74.8	40.1	58.8
Fine-tuned Llama-2-13B	68.2	88.1	59.1	78.7
HDM-1 (0.5B)	80.7	83.6	59.7	78.9
HDM-2 (3B)	80.6	88.5	77.7	85.0

For common-knowledge detection, HDM-2 achieves F1 scores of 83.7 (TruthfulQA) and 73.6 (HDMBench), substantially ahead of vanilla Qwen and even GPT-4o. HDM-2 is optimized for single-pass inference and real-time deployment on single-GPU racks (A100 or similar).

7. Production Characteristics and Deployment

HDM-2 is designed for enterprise standards of inference efficiency, deployment flexibility, and explainability:

Black-box LLM compatibility: Requires only final outputs and context, not internal states of the upstream LLM.
Low resource footprint: Modular LoRA adapters for each head allow component-specific activation without full model commitment.
Single-pass validation: CK and context hallucination checks are performed together in one forward pass, minimizing latency.
Industrial fine-tuning: Internal documents can be used to continually re-train only the enterprise knowledge module.
Explainability: Produces word-level scores and sentence-level rationales, enabling human-in-the-loop calibration and robust audit for risk and compliance.

A plausible implication is that HDM-2’s design represents a convergence of fine-grained, explainable error detection with enterprise operational constraints in LLM deployment. It establishes new state-of-the-art on established hallucination benchmarks while maintaining a tractable parameter count and rapid inference, providing a practical standard for enterprise hallucination monitoring (Paudel et al., 9 Apr 2025).

Markdown Report Issue Upgrade to Chat

References (1)

HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to HDM-2.