Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 81 tok/s

Gemini 2.5 Pro 57 tok/s Pro

GPT-5 Medium 31 tok/s Pro

GPT-5 High 23 tok/s Pro

GPT-4o 104 tok/s Pro

GPT OSS 120B 460 tok/s Pro

Kimi K2 216 tok/s Pro

2000 character limit reached

TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness (2402.12545v2)

Published 19 Feb 2024 in cs.CL

Abstract: LLMs have demonstrated impressive capabilities across various domains, prompting a surge in their practical applications. However, concerns have arisen regarding the trustworthiness of LLMs outputs, particularly in closed-book question-answering tasks, where non-experts may struggle to identify inaccuracies due to the absence of contextual or ground truth information. This paper introduces TrustScore, a framework based on the concept of Behavioral Consistency, which evaluates whether an LLMs response aligns with its intrinsic knowledge. Additionally, TrustScore can seamlessly integrate with fact-checking methods, which assesses alignment with external knowledge sources. The experimental results show that TrustScore achieves strong correlations with human judgments, surpassing existing reference-free metrics, and achieving results on par with reference-based metrics.

Citations (5)

View on Semantic Scholar