Textual Veracity Distortion (TVD)
- Textual Veracity Distortion is the measurable gap between supported and deceptive text, defined through statistical and probabilistic models.
- It leverages generative stress tests, including noise injection and reverse denoising, to assess semantic instability and factual alignment.
- Hybrid calibration combines generative and discriminative approaches to flag and mitigate confidently false or hallucinatory claims.
Textual Veracity Distortion (TVD) refers to the distortion in the truthfulness of textual outputs, manifesting as a divergence between supported (factually accurate) and unsupported or deceptive (hallucinatory or intentionally false) claims. The phenomenon emerges in both automated LLMs—where it is typically conceptualized as hallucination of plausible but incorrect assertions—and in human-produced language, as studied in environments where speakers deliberately attempt deception. TVD is quantifiable through statistical and probabilistic metrics that compare the distributional properties of truthful and false utterances or texts, both at the utterance and the corpus level, and is inherently tied to the challenges of fact verification, detection of misinformation, and the robustness of both human and algorithmic truth discernment.
1. Formalizations and Operational Definitions
TVD’s mathematical formalization differs slightly depending on context. In the human conversational deception setting, TVD for a text snippet is quantified via the log-likelihood ratio:
where is a probabilistic classifier (typically an LLM). Positive values indicate a model preference for labeling as deceptive, negative values indicate truthful (Hazra et al., 2023).
At the population level, divergences such as the Kullback–Leibler divergence
capture distributional gaps between corpora. In automated systems, TVD is indexed by a claim’s position relative to a generative model’s fact-supported “manifold.” Claims mapping off this manifold yield high “semantic energy” when subjected to generative corruption and reconstruction protocols (Gautam et al., 11 Feb 2026).
2. Mechanisms and Psycholinguistic Manifestations
In human settings, as exemplified by “To Tell The Truth: Language of Deception and LLMs,” TVD surfaces as systematic differences in language features between truthful and deceptive utterances. Key cues include:
- Entailment vs. Contradiction: Truthful replies entail facts in an affidavit (ground truth), while deceptive ones contradict those facts.
- Ambiguity: Strategic vagueness, stalling, or random off-topic responses as evasive maneuvers.
- Overconfidence: Excessive certainty, often in fine-grained or numerical details, sometimes betrays fabricated answers.
- Half-truths: Partially correct yet deliberately incomplete or misleading statements.
Such phenomena are operationalized in bottleneck LLM models, which extract linguistic controls (e.g., “Entail,” “Ambig,” “Overconf,” “HalfTruth”) for downstream deception discrimination (Hazra et al., 2023). Empirical evidence from the T4TEXT corpus demonstrates that these textual aspects, in the absence of nonverbal cues, are predictive of deception at rates nearly on par with human judges.
3. Generative LLMs: Non-Equilibrium and Thermodynamic View
Automated TVD is analyzed through the lens of non-equilibrium thermodynamics on learned textual manifolds (Gautam et al., 11 Feb 2026):
- Truth Manifold : In the high-dimensional space of sentence embeddings , a diffusion model trained on factual data induces a low-dimensional manifold . Supported claims reside near (“stable attractors”).
- Factual Stability: For , small perturbations (noise) are corrected by the diffusion score 0, returning the embedding to its origin.
- Hallucinatory Instability: For unsupported/hallucinatory claims 1 lying off 2, noise and denoising via diffusion induce semantic drift toward the truth manifold, often yielding reconstructions semantically divergent from the input.
This perspective underpins procedures that stress-test claims by injecting noise (forward diffusion) and reconstructing (reverse denoising), then scoring the alignment between the input and reconstruction to index TVD.
4. Algorithmic Quantification: Generative Stress Test and Semantic Energy
The Generative Stress Test (GST) operationalizes TVD measurement for LLM outputs as follows (Gautam et al., 11 Feb 2026):
- Embed the input claim 3.
- Corrupt 4 with Gaussian noise at a focal timestep 5:
6
- Reverse-sample via the denoising diffusion model to reconstruct 7 and decode the output 8.
- Measure semantic contradiction between 9 and 0 with a pretrained NLI cross-encoder. Define Semantic Energy:
1
High 2 indicates semantic/factual instability—evidence of TVD.
- Fuse with discriminative confidence score 3 of zero-shot NLI classifier for hybrid calibration:
4
A threshold (e.g., 5) predicts hallucination or high TVD.
5. Empirical Assessment and Benchmarking
Evaluation of TVD detection methods is typically carried out over benchmarks measuring both in-domain and out-of-domain robustness:
| Metric | FEVER (In-Domain) | HOVER (Out-of-Domain) |
|---|---|---|
| Raw MSE AUROC | 0.541 | 0.589 |
| DiffuTruth (Semantic) | 0.640 | 0.566 |
| NLI Baseline | 0.710 | 0.525 |
| Hybrid Calibration | 0.725 | 0.566 |
- DiffuTruth Hybrid Calibration AUROC achieves 0.725 on FEVER, outperforming discriminative NLI by 1.5 points.
- Generalization: Hybrid scheme retains robust performance under distribution shift in HOVER (only 11.6% AUROC drop, vs. 26.1% for baseline NLI).
- Statistical significance: Hybrid vs. NLI on FEVER (6); DiffuTruth vs. NLI on HOVER (7) (Gautam et al., 11 Feb 2026).
In the human deception domain, the best-performing sequential bottleneck LLM approaches near-human accuracy (39.3% vs. 41.3% for judges; Acc@2: 77.3%) and often identifies sessions where human judges are systematically misled (Hazra et al., 2023).
6. Practical Synthesis: Mitigation, Guidelines, and Applications
- Quantification: TVD quantifies the semantic instability of a claim when forced through generative corruption and correction, revealing unsupported or distorted assertions that evade surface-level confidence or fluency metrics.
- Mitigation: Employ the Semantic Energy metric as an unsupervised detection signal; combine with discriminative verifiers using Hybrid Calibration to counter “confident falsehoods.”
- Implementation: Fine-tune diffusion models on true statements in the target domain for robust GST; set 8 (noise) to disrupt lexical copying yet preserve core meaning (approx. 50% corruption optimal); equal hybrid weighting (9) balances generative and discriminative strengths.
- Deployment: Precompute representations, leverage efficient NLI systems, and use progressive distillation to accelerate generative sampling. Continuous energy scores can support end-user decision making, with high-energy claims flagged for human review (Gautam et al., 11 Feb 2026).
A plausible implication is that, in social and journalistic contexts, such algorithmic detection augments human veracity assessment, since purely textual cues (ambiguity, overprecision, half-truths) suffice for robust detection—even when humans are outperformed by LLM-based models (Hazra et al., 2023).
7. Broader Impact and Limitations
TVD is central to the deployment of LLMs in high-stakes environments, given the persistent risk of hallucinated or deceptive content. Benchmarking reveals that standard discriminative methods are prone to overconfidence on unsupported claims, while the generative thermodynamic framework (DiffuTruth) provides improved sensitivity to instability. However, diffusion model–based approaches depend on quality, domain fit, and the calibration of noise and energy thresholds. In human-centric deception studies, while LLMs rival humans on pure text, their exclusivity to linguistic cues limits generalization where multimodal deception strategies are prevalent.
Ongoing research pursues integration of algorithmic TVD flags into human workflows, transparency of detection rationales (“glass box” vs. “black box”), and adaptation to diverse textual genres and adversarial manipulation. The underpinning theory and empirical findings form a foundational basis for joint human–AI veracity restoration across information platforms (Gautam et al., 11 Feb 2026, Hazra et al., 2023).