A Survey of Automatic Hallucination Evaluation on Natural Language Generation (2404.12041v3)

Published 18 Apr 2024 in cs.CL and cs.AI

Abstract: The proliferation of LLMs has introduced a critical challenge: accurate hallucination evaluation that ensures model reliability. While Automatic Hallucination Evaluation (AHE) has emerged as essential, the field suffers from methodological fragmentation, hindering both theoretical understanding and practical advancement. This survey addresses this critical gap through a comprehensive analysis of 74 evaluation methods, revealing that 74% specifically target LLMs, a paradigm shift that demands new evaluation frameworks. We formulate a unified evaluation pipeline encompassing datasets and benchmarks, evidence collection strategies, and comparison mechanisms, systematically documenting the evolution from pre-LLM to post-LLM methodologies. Beyond taxonomical organization, we identify fundamental limitations in current approaches and their implications for real-world deployment. To guide future research, we delineate key challenges and propose strategic directions, including enhanced interpretability mechanisms and integration of application-specific evaluation criteria, ultimately providing a roadmap for developing more robust and practical hallucination evaluation systems.

References (90)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (4)

Tweets

https://twitter.com/thinkwee2767/status/1782313842628014433

A Survey of Automatic Hallucination Evaluation on Natural Language Generation (2404.12041v3)

Summary

Follow-up Questions

Related Papers

Authors (4)

Tweets