Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Med-HALT: Medical Domain Hallucination Test for Large Language Models (2307.15343v2)

Published 28 Jul 2023 in cs.CL, cs.AI, cs.LG, and stat.ML

Abstract: This research paper focuses on the challenges posed by hallucinations in LLMs, particularly in the context of the medical domain. Hallucination, wherein these models generate plausible yet unverified or incorrect information, can have serious consequences in healthcare applications. We propose a new benchmark and dataset, Med-HALT (Medical Domain Hallucination Test), designed specifically to evaluate and reduce hallucinations. Med-HALT provides a diverse multinational dataset derived from medical examinations across various countries and includes multiple innovative testing modalities. Med-HALT includes two categories of tests reasoning and memory-based hallucination tests, designed to assess LLMs' problem-solving and information retrieval abilities. Our study evaluated leading LLMs, including Text Davinci, GPT-3.5, LlaMa-2, MPT, and Falcon, revealing significant differences in their performance. The paper provides detailed insights into the dataset, promoting transparency and reproducibility. Through this work, we aim to contribute to the development of safer and more reliable LLMs in healthcare. Our benchmark can be found at medhalt.github.io

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Ankit Pal (11 papers)
  2. Logesh Kumar Umapathi (4 papers)
  3. Malaikannan Sankarasubbu (13 papers)
Citations (84)