Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MARS: Meaning-Aware Response Scoring for Uncertainty Estimation in Generative LLMs (2402.11756v3)

Published 19 Feb 2024 in cs.CL and cs.LG

Abstract: Generative LLMs are widely utilized for their excellence in various tasks. However, their tendency to produce inaccurate or misleading outputs poses a potential risk, particularly in high-stakes environments. Therefore, estimating the correctness of generative LLM outputs is an important task for enhanced reliability. Uncertainty Estimation (UE) in generative LLMs is an evolving domain, where SOTA probability-based methods commonly employ length-normalized scoring. In this work, we propose Meaning-Aware Response Scoring (MARS) as an alternative to length-normalized scoring for UE methods. MARS is a novel scoring function that considers the semantic contribution of each token in the generated sequence in the context of the question. We demonstrate that integrating MARS into UE methods results in a universal and significant improvement in UE performance. We conduct experiments using three distinct closed-book question-answering datasets across five popular pre-trained LLMs. Lastly, we validate the efficacy of MARS on a Medical QA dataset. Code can be found https://github.com/Ybakman/LLM_Uncertainity.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yavuz Faruk Bakman (7 papers)
  2. Duygu Nur Yaldiz (9 papers)
  3. Baturalp Buyukates (26 papers)
  4. Chenyang Tao (29 papers)
  5. Dimitrios Dimitriadis (32 papers)
  6. Salman Avestimehr (116 papers)
Citations (5)