Papers
Topics
Authors
Recent
Search
2000 character limit reached

HDM-2: Enterprise Hallucination Detection

Updated 19 March 2026
  • HDM-2 is a modular system that detects hallucinations in LLM outputs using fine-grained context and common-knowledge verification.
  • It employs a Qwen-2.5-3B-Instruct backbone augmented with LoRA fine-tuning and lightweight classification heads for precise token- and sequence-level scoring.
  • The system achieves state-of-the-art F1 performance on multiple benchmarks while offering explainable, single-pass validation for efficient enterprise deployment.

HDM-2 refers to a state-of-the-art system for hallucination detection in LLMs with a focus on enterprise deployment. Developed in the context of the “HalluciNot” framework, HDM-2 is designed for fine-grained, efficient detection and annotation of context-based and common-knowledge hallucinations. It provides a modular approach that integrates context verification, common-knowledge checking, enterprise customization, and explainability, achieving leading results on multiple benchmarks (Paudel et al., 9 Apr 2025).

1. Architectural Overview

HDM-2 operates as a post-hoc validation layer over arbitrary LLMs. The input consists of a response rr generated by any LLM, the concatenated context cc (user prompt plus retrieved enterprise documents, if present), and a detection threshold t[0,1]t \in [0,1]. The model employs a Qwen-2.5-3B-Instruct backbone augmented via LoRA fine-tuning and a set of lightweight classification heads.

Core modules include:

  • Context-based hallucination detector: A multi-task LoRA + classification head predicts global (sequence-level) and token-level scores, hs(c,r)h_s(c,r) and hw(c,r)\mathbf{h}_w(c,r), related to context agreement.
  • Common-knowledge verification: A frozen, shallow classifier on intermediate hidden layers outputs hk(sj)h_k(s_j) for each sentence sjs_j, indicating whether the statement is widely recognized as true.
  • Enterprise knowledge detector (optional): A LoRA-based module trained on proprietary datasets to flag enterprise-specific but context-absent facts.
  • Explanation generator: A LoRA-tuned head generates human-readable rationales for flagged spans.

The system outputs global and token-level hallucination scores, a set of flagged sentences GG, per-sentence common-knowledge validation, optional enterprise checks, and explanations. Modular adapters allow for component-specific loading in production, minimizing memory requirements.

2. Hallucination Taxonomy

HDM-2 formalizes four categories of LLM outputs in the enterprise setting (Paudel et al., 9 Apr 2025):

  • Context-Based Hallucinations: Statements not supported or contradicted by the input context cc. Primary focus of HDM-2’s context module.
  • Common Knowledge: Widely recognized facts not present in cc but expected to be known by a competent LLM. Allowed if verified by the CK head (hk>th_k > t).
  • Enterprise-Specific Knowledge: Proprietary knowledge absent from both cc and open-domain corpora but correct within organizational context. Detected via continual pre-training on closed corpora.
  • Innocuous Statements: Politeness or generic text without factual content, filtered out by rule-based heuristics to avoid false positives.

This taxonomy is embedded in the model’s operational workflow, ensuring that not only hallucinations but also permissible out-of-context facts or domain-specific statements are appropriately handled.

3. Mathematical Scoring and Sentence Selection

Hallucination detection uses explicit, multi-level scoring functions:

  • Context-based scoring:
    • Sequence: hs(c,r)=σ(wseqCLS(c,r))h_s(c,r) = \sigma(\mathbf{w}_\mathrm{seq}^\top \mathrm{CLS}(c,r))
    • Token: hwi(c,r)=σ(wtokhi)h_w^i(c,r) = \sigma(\mathbf{w}_\mathrm{tok}^\top \mathbf{h}_i), for i=1ni=1\ldots n, where hi\mathbf{h}_i is the ii-th token’s hidden state.
  • Candidate spans: Sentences sjs_j are mapped to token runs, and the set GG is formed via programmable aggregation (e.g., max, average, proportion-above-threshold):

G={sj  f(hwi,,hwi+m1)>t}G = \{s_j\ |\ f(h_w^{i},\ldots,h_w^{i+m-1}) > t \}

  • Common knowledge validation:
    • hk(sj)=σ(wkhL,sj+bk)h_k(s_j) = \sigma(\mathbf{w}_k^\top \mathbf{h}_{L^*,s_j} + b_k), using hidden states from a specified backbone layer (L=25L^* = 25 for Qwen).

Only sentences failing both context and common-knowledge checks are flagged as hallucinations.

4. Fine-Grained Annotation Mechanism

During supervised training, each response token is labeled:

  • $0$: supported by the provided context,
  • $1$: supported by general (common) knowledge,
  • $2$: hallucinated.

The model’s token classification head is trained to optimize

Ltok=iCE(hwi,1{yi=2})\mathcal{L}_\text{tok} = \sum_i \operatorname{CE}(h_w^i, \mathbf{1}\{y_i = 2\})

Inference highlights exact spans (hwi>th_w^i > t), facilitating audits, error analysis, and compliance. Sentence-level judgments can be modulated via aggregation hyperparameters, enabling customizable calibration of recall/precision.

5. Training Data, Optimization, and Domain Adaptation

HDM-2 utilizes:

  • Backbone: Qwen-2.5-3B-Instruct.
  • Datasets: HDMBench (∼50K context-doc pairs with token-level human validation across sources like RAGTruth, SQuAD, Red Pajama v2, internal tickets), RagTruth for context supervision, and True/False and TruthfulQA for common-knowledge supervision.
  • Optimization: LoRA adapters plus dual classification heads, jointly minimizing

L=λsLseq+λwLtok\mathcal{L} = \lambda_s\,\mathcal{L}_\text{seq} + \lambda_w\,\mathcal{L}_\text{tok}

The CK head is trained separately on frozen intermediate features. The enterprise detector is realized through continued LoRA pre-training and a shallow classifier. This architecture allows incremental adaptation to new domains by fine-tuning only the knowledge classifier, not the full model.

6. Empirical Performance

HDM-2 outperforms both black-box LLM prompts and prior hallucination detectors (including SelfCheckGPT and fine-tuned Llama-2) on multiple tasks (Paudel et al., 9 Apr 2025). Notable quantitative results:

Method QA F1 Data2Txt F1 Summarization F1 Overall F1
Prompt (GPT-3.5) 30.8 77.4 37.1 52.9
Prompt (GPT-4) 45.6 78.3 47.6 63.4
SelfCheckGPT (3.5) 43.7 74.8 40.1 58.8
Fine-tuned Llama-2-13B 68.2 88.1 59.1 78.7
HDM-1 (0.5B) 80.7 83.6 59.7 78.9
HDM-2 (3B) 80.6 88.5 77.7 85.0

For common-knowledge detection, HDM-2 achieves F1 scores of 83.7 (TruthfulQA) and 73.6 (HDMBench), substantially ahead of vanilla Qwen and even GPT-4o. HDM-2 is optimized for single-pass inference and real-time deployment on single-GPU racks (A100 or similar).

7. Production Characteristics and Deployment

HDM-2 is designed for enterprise standards of inference efficiency, deployment flexibility, and explainability:

  • Black-box LLM compatibility: Requires only final outputs and context, not internal states of the upstream LLM.
  • Low resource footprint: Modular LoRA adapters for each head allow component-specific activation without full model commitment.
  • Single-pass validation: CK and context hallucination checks are performed together in one forward pass, minimizing latency.
  • Industrial fine-tuning: Internal documents can be used to continually re-train only the enterprise knowledge module.
  • Explainability: Produces word-level scores and sentence-level rationales, enabling human-in-the-loop calibration and robust audit for risk and compliance.

A plausible implication is that HDM-2’s design represents a convergence of fine-grained, explainable error detection with enterprise operational constraints in LLM deployment. It establishes new state-of-the-art on established hallucination benchmarks while maintaining a tractable parameter count and rapid inference, providing a practical standard for enterprise hallucination monitoring (Paudel et al., 9 Apr 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to HDM-2.