Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Cost-Effective Hallucination Detection for LLMs (2407.21424v2)

Published 31 Jul 2024 in cs.CL, cs.AI, cs.LG, and stat.ML

Abstract: LLMs can be prone to hallucinations - generating unreliable outputs that are unfaithful to their inputs, external facts or internally inconsistent. In this work, we address several challenges for post-hoc hallucination detection in production settings. Our pipeline for hallucination detection entails: first, producing a confidence score representing the likelihood that a generated answer is a hallucination; second, calibrating the score conditional on attributes of the inputs and candidate response; finally, performing detection by thresholding the calibrated score. We benchmark a variety of state-of-the-art scoring methods on different datasets, encompassing question answering, fact checking, and summarization tasks. We employ diverse LLMs to ensure a comprehensive assessment of performance. We show that calibrating individual scoring methods is critical for ensuring risk-aware downstream decision making. Based on findings that no individual score performs best in all situations, we propose a multi-scoring framework, which combines different scores and achieves top performance across all datasets. We further introduce cost-effective multi-scoring, which can match or even outperform more expensive detection methods, while significantly reducing computational overhead.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Simon Valentin (5 papers)
  2. Jinmiao Fu (8 papers)
  3. Gianluca Detommaso (10 papers)
  4. Shaoyuan Xu (12 papers)
  5. Giovanni Zappella (28 papers)
  6. Bryan Wang (25 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets