SpikeScore: Hallucination Detection in LLMs

Updated 3 February 2026

SpikeScore is a quantitative metric that identifies hallucinations in large language models by analyzing abrupt second-order differences in uncertainty scores over multi-turn dialogues.
It computes the maximum absolute curvature in the sequence of scores, distinguishing factual responses from hallucinated ones with statistical guarantees and high AUROC.
Empirical evaluations show SpikeScore’s robust performance across domains and models, outperforming traditional detection probes in noisy and adversarial scenarios.

SpikeScore is a quantitative metric for @@@@2@@@@ in LLMs under cross-domain generalization constraints. By measuring abrupt curvature in the trajectory of uncertainty or truthfulness scores during multi-turn self-dialogue, SpikeScore provides a domain-agnostic signal to differentiate hallucinated from factual LLM continuations. Empirical and theoretical analyses demonstrate that SpikeScore achieves state-of-the-art cross-domain separability, outperforming traditional detection probes and recent generalization-focused baselines (Deng et al., 27 Jan 2026).

1. Formal Definition and Mathematical Formulation

Given a sequence $S = (S^1, \ldots, S^K)$ of per-turn uncertainty or truthfulness scores, as assigned by a probe over a $K$ -turn LLM-generated dialogue, SpikeScore is formally defined via the discrete second-order difference operator:

$\Delta^2 S^k = S^{k+1} - 2 S^k + S^{k-1} \quad (1 < k < K).$

The SpikeScore for the sequence $S$ is

$\mathrm{SpikeScore}(S) = \max_{1 < k < K} |\Delta^2 S^k|,$

i.e., the maximum absolute "curvature" of the score series. Large SpikeScore values correspond to abrupt rises and falls in the probe's confidence, which are characteristic signatures of hallucination-initiated multi-turn continuations (Deng et al., 27 Jan 2026).

2. Multi-Turn Self-Dialogue Simulation Protocol

SpikeScore relies on simulated multi-turn self-dialogue by the LLM. For an input question $Q$ and the LLM's initial answer $A^1$ , responses are recursively elicited as follows:

$A^i \sim \mathbb{P}_\theta(\cdot \mid Q, A^1, P^2, A^2, \ldots, P^{i-1}, A^{i-1}, P^i),$

where prompts $P^2, \ldots, P^K$ are sampled from a library spanning diverse types (e.g., Encouraging, Analytical, Stepwise) and are sequenced to increase “strength” over turns. At each step, a probe $p_W$ assigns a score $S(A^i) = p_W(E_\theta(A^i \mid \cdots)) \in [0, 1]$ . Empirical score plots reveal that hallucinated continuations induce prominent spikes in $\Delta^2$ , while factual dialogues exhibit smoother profiles, a phenomenon formalized by SpikeScore (Deng et al., 27 Jan 2026).

3. Theoretical Separability Guarantees

Under mild moment conditions, the SpikeScore statistic offers provable probabilistic separability between hallucination and non-hallucination distributions in mixtures of test domains. Denoting the SpikeScore by $X$ for hallucination and $Y$ for non-hallucination samples:

$\mathbb{E}[X] > 2 \mathbb{E}[Y]$ (mean gap)
$\mathrm{Std}[X]/\mathrm{Std}[Y] \leq 2.5$
Coefficient of variation (CV) of $Y$ ( $\mathrm{Std}[Y]/\mathbb{E}[Y]$ ) $\leq 0.2$

Cantelli’s inequality establishes that

$\Pr[X > Y] \geq \frac{(\delta-1)^2}{(r^2 + 1)\,c^2 + (\delta-1)^2} \geq \frac{1}{1 + 0.0725\,t^2},$

where $\delta = \mathbb{E}[X]/\mathbb{E}[Y]$ , $r = \mathrm{Std}[X]/\mathrm{Std}[Y]$ , $c = \mathrm{Std}[Y]/\mathbb{E}[Y]$ , $t > 0$ . For $t = 2$ , this yields $\Pr[X > Y] \geq 0.775$ . This result indicates that, with high probability, SpikeScore ranks a hallucinated chain above a factual chain across domains (Deng et al., 27 Jan 2026).

4. Experimental Evaluation

SpikeScore’s efficacy is validated on diverse datasets and models:

Dataset	Domain Type
TriviaQA	Knowledge QA
CommonsenseQA	Commonsense MCQ
Belebele	Multilingual RC
CoQA	Conversational QA
MATH	Competition Math
SVAMP	Word Problems

Model	Configuration
Llama-3.2-3B-Instruct	temp=0.2, p=0.9
Llama-3.1-8B-Instruct	temp=0.2, p=0.9
Qwen3-8B-Instruct	reasoning-mode, t=0.6, p=0.95
Qwen3-14B-Instruct	same as 8B

Protocol: Training is performed on a single dataset, with leave-one-out cross-domain evaluation over the remaining five (K = 20 turns). Hallucination detection performance is measured by AUROC. SpikeScore, when paired with SAPLMA as probe, achieves mean AUROC $\approx 0.786$ (Llama-3.1-8B) to $0.787$ (Qwen3-14B) — outperforming both training-free baselines (mean $\sim$ 0.70) and cross-domain specialized methods (mean $\sim$ 0.74) (Deng et al., 27 Jan 2026).

SpikeScore maintains robust performance under noisy retrieval-augmented generation (RAG) scenarios (TriviaQA, RAGTruth), reaching AUROC up to 0.87. Its backbone-agnostic design yields consistent cross-domain gains with probes including Perplexity, Reasoning Score, ICS, SEP, and SAPLMA.

5. Algorithmic Description and Parameterization

The SpikeScore detection algorithm proceeds as follows:

Initialize dialogue history with $(Q, A)$ .
For $i$ $i$ in $2 \ldots K$ $2 \dots K$ :
- Sample prompt $P_i$ (weak-to-strong schedule).
- Obtain $A_i = \mathrm{LLM\_generate}(H \parallel P_i)$ .
- Append $(P_i, A_i)$ to $H$ .
- Compute $S[i] = p_W(E_\theta(A_i \mid H))$ .
Compute $s = \max_{1 < i < K} |S[i+1] - 2 S[i] + S[i-1]|$ .
Output $y = 1$ (hallucination) if $s \geq \lambda$ ; else $0$.

Key parameters:

$K=20$ dialogue turns (detectability saturates at 15–20; longer chains may introduce drift)
$\lambda$ (threshold) tuned on validation split ( $\sim$ 0.1–0.3)
Prompt library: five strength levels across eight prompt types (Encouraging→Reflective).

6. Statistical Phenomena and Empirical Properties

Empirical findings substantiating SpikeScore’s cross-domain robustness include:

Mean SpikeScore for hallucinations exceeds $2 \times$ the non-hallucination mean (Observation 1).
Standard deviation ratio for SpikeScore between hallucination and non-hallucination chains is at most 2.5 (Obs 2).
Coefficient of variation of factual SpikeScore is consistently $\leq 0.2$ across leave-one-out settings (Obs 3) (Deng et al., 27 Jan 2026).

These properties underpin the observed cross-domain generalization, as the underlying spike phenomenon persists across question genres, languages, and model variants.

7. Limitations and Prospective Directions

Documented limitations of SpikeScore include:

Dialogue-length Sensitivity: Factual continuations may yield spurious spikes for $K > 20$ , necessitating budgets of 15–20 turns.
Prompt-Design Dependence: Although moderately robust to prompt seeds, adversarial or out-of-library prompt regimes present an open challenge.
Robustness to Alignment Attacks: Mild “polite” instructions have little effect, while aggressive instruction tuning or adversarial alignment may attenuate the spike signal.
Threshold Calibration: While $\lambda$ is generally stable, its automatic adjustment for zero-shot domains requires further research.

Possible extensions:

Application to structured prediction pipelines (e.g., tool use, chain-of-thought prompting).
Exploration of higher-order temporal differences or alternative sequence curvature statistics.
Integration with calibrated confidence estimators for deployment monitoring.
Adaptation to multilingual or multimodal LLMs, to accommodate modality-specific hallucination signatures.

SpikeScore operationalizes multi-turn instability as a domain-invariant hallmark of hallucination, providing theoretically justified and empirically validated detection in cross-domain settings (Deng et al., 27 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Beyond In-Domain Detection: SpikeScore for Cross-Domain Hallucination Detection (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to SpikeScore.

SpikeScore: Hallucination Detection in LLMs

1. Formal Definition and Mathematical Formulation

2. Multi-Turn Self-Dialogue Simulation Protocol

3. Theoretical Separability Guarantees

4. Experimental Evaluation

5. Algorithmic Description and Parameterization

6. Statistical Phenomena and Empirical Properties

7. Limitations and Prospective Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

SpikeScore: Hallucination Detection in LLMs

1. Formal Definition and Mathematical Formulation

2. Multi-Turn Self-Dialogue Simulation Protocol

3. Theoretical Separability Guarantees

4. Experimental Evaluation

5. Algorithmic Description and Parameterization

6. Statistical Phenomena and Empirical Properties

7. Limitations and Prospective Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research