Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification (2405.00253v3)

Published 30 Apr 2024 in cs.CL and cs.SE

Abstract: LLMs have made significant progress in code generation, offering developers groundbreaking automated programming support. However, LLMs often generate code that is syntactically correct and even semantically plausible, but may not execute as expected or fulfill specified requirements. This phenomenon of hallucinations in the code domain has not been systematically explored. To advance the community's understanding and research on this issue, we introduce the concept of code hallucinations and propose a classification method for code hallucination based on execution verification. We categorize code hallucinations into four main types: mapping, naming, resource, and logic hallucinations, with each category further divided into different subcategories to understand and address the unique challenges faced by LLMs in code generation with finer granularity. Additionally, we present a dynamic detection algorithm called CodeHalu designed to detect and quantify code hallucinations. We also introduce the CodeHaluEval benchmark, which includes 8,883 samples from 699 tasks, to systematically and quantitatively evaluate code hallucinations. By evaluating 17 popular LLMs using this benchmark, we reveal significant differences in their accuracy and reliability in code generation, offering detailed insights for further improving the code generation capabilities of LLMs. The CodeHalu benchmark and code are publicly available at https://github.com/yuchen814/CodeHalu.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Yuchen Tian (12 papers)
  2. Weixiang Yan (11 papers)
  3. Qian Yang (146 papers)
  4. Qian Chen (264 papers)
  5. Wen Wang (144 papers)
  6. Ziyang Luo (35 papers)
  7. Lei Ma (195 papers)
  8. Xuandong Zhao (47 papers)
  9. Dawn Song (229 papers)
Citations (2)