Representation-based Broad Hallucination Detectors Fail to Generalize Out of Distribution

Published 19 Sep 2025 in cs.LG and cs.AI | (2509.19372v1)

Abstract: We critically assess the efficacy of the current SOTA in hallucination detection and find that its performance on the RAGTruth dataset is largely driven by a spurious correlation with data. Controlling for this effect, state-of-the-art performs no better than supervised linear probes, while requiring extensive hyperparameter tuning across datasets. Out-of-distribution generalization is currently out of reach, with all of the analyzed methods performing close to random. We propose a set of guidelines for hallucination detection and its evaluation.