HALTT4LLM is a project aimed at creating a common metric to test large language models' (LLMs) progress in eliminating hallucinations, a major issue preventing their widespread adoption.
The strategy involves using trivia questions in multiple-choice format, including real, fake, and 'none of the above' questions, to assess various LLMs' performance in mitigating hallucinations.
Key terms:
HALTT4LLM: Hallucination Trivia Test for Large Language Models, a project to create a common metric for evaluating LLMs' progress in eliminating hallucinations
Hallucinations: Incorrect or nonsensical answers generated by LLMs, a major issue preventing their widespread adoption
Trivia questions: Multiple-choice questions used to test LLMs, including real, fake, and 'none of the above' questions
Scoring: +2 for a correct answer, -1 for an uncertain ('I don't know') answer, and 0 for an incorrect answer
Dataset: Consists of three sets of trivia questions (hq_trivia_questions.json, fake_trivia_questions.json, and none_of_the_above_questions.json) in the same JSON format