Self-Reinforcing Calibration for Hallucination Control

Updated 19 September 2025

The paper introduces a novel dual-proxy framework that dynamically calibrates model outputs to suppress hallucinations during inference.
It leverages lightweight proxy models (FAP and HDP) to steer logit adjustments and achieve high factual consistency rates, such as 99.2% in TruthfulQA.
Its plug-and-play design enables immediate deployment in high-stakes environments without needing modifications to core model weights.

Dynamic Self-reinforcing Calibration for Hallucination Suppression (DSCC-HS) is a proactive framework for reducing hallucinations in LLMs and large vision-LLMs (LVLMs) by means of dynamic, self-correcting mechanisms applied at inference time. DSCC-HS is characterized by interventions that adaptively alter the decoding process through plug-and-play modules grounded in formal proxy models, with the specific aim of steering the model away from hallucinatory content and toward factual, grounded output. The approach is inspired by dual-process cognitive theory, leveraging both rapid “intuitive” hallucinatory detection and analytical factual alignment within a feedback-driven calibration loop.

1. Framework and Core Principles

DSCC-HS operates as a two-phase system comprising (1) the construction and training of compact proxy models—Factual Alignment Proxy (FAP) and Hallucination Detection Proxy (HDP)—and (2) dynamic calibration during inference using real-time steering vectors derived from these proxies. The FAP is optimized to favor factual alignment while the HDP is specialized to predict hallucinatory tendencies. The difference between FAP and HDP logits forms a steering vector injected into the target model’s decoding step: $g^{(t)} = l_\mathrm{FAP}^{(t)} - l_\mathrm{HDP}^{(t)}$ This vector is projected into the target model’s vocabulary space and added to the native logits: $l_\mathrm{adjusted}^{(t)} = l_\mathrm{target}^{(t)} + \hat{G}^{(t)}$ where $\hat{G}_i^{(t)} = g_i^{(t)}$ if $i$ is present in both vocabularies, and zero otherwise. Sampling is then conducted from the softmax of the adjusted logits at each generation step. This paradigm requires no internal modification or fine-tuning of the target model.

The training of FAP and HDP is governed by a contrastive logit-space optimization objective: $L_k = \|l_{\mathrm{base}} - l_{\mathrm{FAP}}\|^2 - \|l_{\mathrm{base}} - l_{\mathrm{HDP}}\|^2$ where the base model is used as a reference; minimizing $L_k$ promotes the separation of factual and hallucinatory directions in the representation space.

2. Cognitive and Algorithmic Inspiration

The design of DSCC-HS directly draws from dual-process cognitive theory, mapping the two proxy models onto “System 1” (intuitive, rapid detection—HDP) and “System 2” (analytical, deliberative factual alignment—FAP). The real-time logit steering is formally analogous to the continuous revision of intuitive judgments by reflective processes in human cognition. This dual mechanism enables DSCC-HS to detect and correct hallucinations as they emerge during the sequential decoding process, rather than acting only post-hoc.

3. Technical Implementation and Plug-and-play Nature

Implementation of DSCC-HS proceeds in two sequential phases:

Proxy Model Construction and Training: A lightweight base LLM is adapted into two adversarial proxies using data augmentation techniques (question paraphrasing, answer perturbation, etc.) and contrastive objectives. LoRA-based adaptations (rank $r=8$ , scaling factor $\alpha=16$ ) are used for efficient parameter updates on query and value matrices.
Inference-time Dynamic Steering: At each autoregressive decoding step in the target LLM, both proxies produce logits, form the steering difference, and reweight the generation logits of the target model via vocabulary-aligned projection. This intervention is entirely external and does not require modifications to the core model weights, enabling immediate deployment and system-level scalability.

4. Benchmark Performance and Empirical Validation

DSCC-HS demonstrates strong empirical performance on benchmark tasks characterized by hallucination risk:

TruthfulQA: Achieves 99.2% Factual Consistency Rate (FCR) and a hallucination score of 0.8, outperforming strong baselines such as ITI and DOLA.
BioGEN: Attains a FActScore of 46.50 (highest reported) with an Incorrectness of 11.49 on long-form biomedical generation, indicating substantial robustness in both short-form and long-form factual tasks.

These results establish the framework’s superior ability to steer LLMs away from misleading or spurious generations, especially in settings demanding high factual fidelity.

DSCC-HS represents a specific instantiation of dynamic self-reinforcing calibration, distinguished by explicit proxy model opposition and logit-space manipulation. Other dynamic hallucination mitigation approaches include:

Dynamic logit or attention interventions (e.g., Dynamic Logits Calibration (Chen et al., 26 Jun 2025), Adaptive Attention Modulation (Oorloff et al., 24 Feb 2025), Dynamic Correction Decoding (Wang et al., 15 Oct 2024)), which assess visual or semantic alignment at each decoding stage and alter output probabilities based on relative grounding or confidence.
Contrastive and feedback-based revision systems (e.g., Volcano (Lee et al., 2023)) that iteratively critique, revise, and accept responses using either external feedback or self-generated linguistic cues.
Self-calibrating or memory-driven mechanisms (e.g., MDSAM (Lu et al., 21 Jun 2025), DCLA (Tang et al., 18 May 2025), HalluRNN (Yu et al., 21 Jun 2025), D²HScore (Ding et al., 15 Sep 2025)) that aggregate or propagate evidence across layers or time, recalibrating representations when drift from factuality or dispersion drops below thresholds.

A key distinguishing feature of DSCC-HS is the plug-and-play, external steering strategy grounded in dual-proxy opposition, whereas most alternative approaches manipulate internal representations, attention, or use self-feedback for calibration.

6. Broader Implications and Future Directions

DSCC-HS demonstrates that proactive, dynamic inference-time interventions can systematically enhance factual consistency, offering a blueprint for future hallucination suppression methods. The plug-and-play nature facilitates deployment into high-stakes environments (e.g., scientific, technical, or biomedical domains) where model retraining or heavy fine-tuning is infeasible.

Extensions of the DSCC-HS paradigm can encompass:

Domain-specific proxy specialization for context-sensitive hallucination assessment.
Generalization of dynamic calibration to multimodal domains (LVLMs), potentially integrating visual grounding proxies analogous to FAP/HDP.
Integration with dynamic attention or semantic dispersion-drift diagnostics (as in D²HScore (Ding et al., 15 Sep 2025)) for layer-wise introspective monitoring and self-reinforcing response adjustment.
Application in decentralized or distributed agent environments where proactive factual steering is required without central retraining.

Counterfactual interventions, dynamic weighting, and cognitive-inspired arbitration mechanisms may further elevate DSCC-HS and related frameworks as central strategies in ensuring reliability and factual trustworthiness in generative AI models.