Unfaithful Perception Rate
- Unfaithful perception rate is a measure of how often outputs, reasoning steps, or perceptual judgments deviate from ground truth, with definitions varying across fields like information theory, machine learning, and psychophysics.
- It is quantified using divergence metrics (e.g., total variation, KL, Wasserstein) and assessed via protocols such as rate–distortion optimization, calibrated hallucination measures, and controlled miscalibration in models.
- The metric informs practical applications by benchmarking model hallucinations, guiding algorithm design, and improving trust and interpretability in human–AI and causal reasoning settings.
Unfaithful perception rate quantifies the frequency or proportion of model outputs, reasoning steps, communication acts, or perceptual judgments that do not accurately reflect the available evidence, ground truth, or perceptual input. The term spans information theory (e.g. rate-distortion-perception tradeoff), machine learning (hallucination in LLMs/VLMs), psychophysics (conditioned hallucinations), causal cognition (disagreement in causal structure), and human–AI or human–environment settings. Across domains, unfaithful perception is operationalized as the complement of a rigorously defined faithfulness or perceptual fidelity metric. This article surveys its formalizations, measurement protocols, computational models, empirical findings, and the implications for trust, system design, and interpretability.
1. Formal Definitions Across Domains
Within information theory and communication, the unfaithful perception rate is typically defined via a constraint on the divergence between the marginal distribution of reconstructions and that of the true source. Let denote the source, its reconstruction, and a divergence (often total variation, an -divergence, or Wasserstein). For distortion and perception threshold , the rate–distortion–perception function is
with the unfaithful perception rate indexed by (the maximum permissible divergence) (Blau et al., 2019, Serra et al., 2023, Chen et al., 2022).
In LLM evaluation, unfaithful perception rate generalizes the hallucination rate. For output facts with reference dataset , monofact rate (probability that a fact appears exactly once in ), and calibration error (total variation or KL between model output and reference frequencies), Kalai & Vempala prove
where is the probability of an unfaithful (hallucinated) generation (Miao et al., 11 Feb 2025).
Faithfulness in multimodal models is measured as the fraction of atomic or chain-level statements visually supported by an image, yielding an unfaithful perception rate as $1$ minus the proportion of faithful objects, steps, or atomic facts (Li et al., 11 Nov 2025, Uppaal et al., 13 Dec 2025, Jing et al., 2023).
Psychophysically, as in conditioned hallucinations, the unfaithful perception rate is the conditional probability of a subject reporting a percept ("voice") where none was present: (Benrimoh et al., 2023).
In causal reasoning, if decision-makers hold structural causal models (SCMs) over variables , then the unfaithful perception rate is the fraction of pairs with on at least one directed edge; equivalently, the indicator of edge disagreement averaged over pairs (Alvarez et al., 24 Jan 2024).
2. Mathematical Characterizations and Metrics
Information theory systematically characterizes the trade-off between rate, distortion, and perception: where may be total variation, -divergence, or Wasserstein (Blau et al., 2019, Serra et al., 27 Aug 2024). quantifies the rate penalty relative to unconstrained distortion coding.
In LLM frameworks, unfaithful perception rate employs bin-wise calibration measures and monofact statistics: where is the empirical monofact rate, and is a calibration error (often binned KL divergence or TV distance). Intervention via selective upweighting reduces by increasing at fixed (Miao et al., 11 Feb 2025).
In multimodal and VLM contexts, atomic-level and sentence-level unfaithful perception rates are computed as: or for reasoning-chains,
with variants for chain-level and sentence-level aggregation (Uppaal et al., 13 Dec 2025, Li et al., 11 Nov 2025, Jing et al., 2023).
For causal graphs, the metric is: (Alvarez et al., 24 Jan 2024).
3. Robust Measurement Protocols
Protocols precisely specify what is labeled "unfaithful" or "hallucinated" depending on the context:
- In information theory, algorithms compute via parametric Lagrangian minimization with alternating or relaxed Blahut–Arimoto–style iterations, yielding globally optimal solutions under convexity for a prescribed (Serra et al., 27 Aug 2024, Serra et al., 2023).
- LLM hallucination rates are measured by (a) sampling training data to control empirical monofact rates, (b) training models under controlled miscalibration, and (c) validating predicted hallucination rates with bin-wise metric computations (Miao et al., 11 Feb 2025).
- Faithfulness in VLMs is operationalized via LLM or VEM-based pipelines for sub-sentence identification, atomic fact extraction, and fact–image entailment. Sentence- and atomic-level UPRs are reported globally, by fact type (entities, colors, relations), and by prompt structure (Jing et al., 2023).
- In explicit chain-of-thought tasks, unfaithful recovery is annotated as present when the model produces the correct answer via an unsupported, hallucinatory, or inconsistent chain (no explicit correction of an injected error) (Yee et al., 23 May 2024).
- Causal disagreement rates are measured over a population of SCMs, quantifying pairwise or reference-based divergence in graph structure (Alvarez et al., 24 Jan 2024).
- Human studies employ forced-choice or evidence-localization (e.g., in image forensics), reporting unfaithful perception as the false negative rate in detecting alterations (Schetinger et al., 2015).
- Conditioned hallucination in psychophysics uses the proportion of false-positive reports in no-stimulus trials; model-based inferences (e.g., HGF) relate unfaithful perception rate to latent parameters (prior/sensory weighting) (Benrimoh et al., 2023).
4. Empirical Rates, Determinants, and Bounds
Empirically, unfaithful perception rates span domains and model types:
- In lossy coding, enforcing perfect perceptual fidelity increases minimal rate relative to unconstrained distortion coding; the rate penalty obeys known upper bounds (e.g., by at most a 3 dB distortion penalty under MSE) (Blau et al., 2019).
- LLMs: Hallucination rates approach the monofact rate at perfect calibration; reductions of up to 40% are achieved via targeted miscalibration (Miao et al., 11 Feb 2025). Empirical values for modern SOTA models are 6% for GPT-5 (summarization, VeriGray), 14–46% (VLMs, visual atomic facts), and up to 60% (LLM in-context under high monofact rate) (Ding et al., 24 Oct 2025, Jing et al., 2023).
- FaithAct and other visual reasoning systems report relative gains in perceptual faithfulness up to 26%—translating into lowered UPR—without loss of task accuracy (Li et al., 11 Nov 2025, Uppaal et al., 13 Dec 2025).
- In chain-of-thought error recovery, unfaithful rates for GPT-4 reach 35–40% on copying errors and remain non-trivial even with explicit error-detection prompts (Yee et al., 23 May 2024).
- Human image forensics show UPR = 0.535 (53.5% of altered digital images were missed), independent of confidence or attention (Schetinger et al., 2015).
- Conditioned hallucination (CH) rates correlate with prior-overweighting in perception models (HGF) and with clinically assessed hallucination proneness (Benrimoh et al., 2023).
- Empirical rate surfaces (e.g., ) are monotonic and convex in both distortion and perception tolerance; "free" zero-rate communication arises when sufficient decoder side information is present (Chai et al., 2023).
5. Applications and Implications
Unfaithful perception rate is an operational measure for:
- Benchmarking: Global and fine-grained quantification of hallucination and error modes in LLMs, VLMs, and reasoning chains.
- Algorithm design: Explicit optimization of perceptual fidelity in information-theoretic coding, informed by convexity and parametric algorithms (Serra et al., 2023).
- Intervention analysis: Tuning model calibration or training procedures (e.g., selective upweighting, self-reflection and regeneration in VLMs, error-detecting prompts) to manage UPR and its tradeoff against coverage and accuracy (Miao et al., 11 Feb 2025, Uppaal et al., 13 Dec 2025, Li et al., 11 Nov 2025).
- Fairness, trust, and interpretability: In causal settings, high UPR signals stakeholder disagreement over causal structure, critically affecting fairness analyses. In chain-of-thought reasoning, persistent unfaithfulness undermines the transparency and reliability of intermediate steps—even when final answers are correct (Alvarez et al., 24 Jan 2024, Yee et al., 23 May 2024).
- Perception science: CH rates in clinical and non-clinical populations link behavioral unfaithful perception to mechanistic model parameters (e.g., prior/sensory weighting, decision noise), supporting theoretical models of hallucination (Benrimoh et al., 2023).
6. Open Challenges and Future Directions
Core methodological challenges include:
- Disambiguating annotation boundaries: Recent frameworks (VeriGray) make explicit the "Out-Dependent" zone, disentangling hallucinations from cases requiring external knowledge or ambiguous references. This is crucial for reducing annotation noise and improving detector evaluation (Ding et al., 24 Oct 2025).
- Consistency across domains: Definitions and thresholds for faithfulness and unfaithfulness (and therefore UPR) depend on rigorous operationalizations—factual entailment vs. perceptual grounding vs. statistical divergence—necessitating careful specification and agreement across benchmarks (Jing et al., 2023, Uppaal et al., 13 Dec 2025).
- Algorithmic efficiency: Numerical evaluation of under general f-divergence constraints is nontrivial, demanding sophisticated alternating minimization and root-finding schemes; convergence guarantees and computational overhead are subject to the regularity of the metric (Serra et al., 27 Aug 2024, Serra et al., 2023).
- Human–AI interaction: High UPR persists in humans for subtle digital forgeries; scaling human–AI ensembles or hybrid workflows introduces additional layers of perceptual (un)faithfulness (Schetinger et al., 2015, Alvarez et al., 24 Jan 2024).
- Uncertainty and abstention: Selective prediction and classifier abstention (on Out-Dependent or ambiguous cases) reduces hallucination risk but poses coverage–fidelity trade-offs.
Ongoing work seeks to generalize perceptual faithfulness constraints to more complex knowledge domains, leverage explicit machine annotation and external retrieval, and integrate perceptual constraints directly into model training. The unfaithful perception rate serves as both a theoretical lever and a practical diagnostic for advancing the reliability and interpretability of autonomous and human-in-the-loop systems.