Dice Question Streamline Icon: https://streamlinehq.com

Reliable measurement of hallucination in long-form settings

Develop a reliable method to measure hallucination in long-form generation, where hallucination is defined as correctness of facts with respect to a model’s internal knowledge, distinct from factuality assessed against external world knowledge.

Information Square Streamline Icon: https://streamlinehq.com

Background

The work focuses on factuality—correctness with respect to external world knowledge—using SAFE to verify individual facts via Google Search.

The authors note that this is different from hallucination—correctness relative to a model’s internal knowledge—and state that a reliable approach for measuring hallucination in long-form outputs remains unsettled.

References

Furthermore, our work considers factuality (i.e., correctness of facts with respect to world knowledge), and so it is still unclear how to reliably measure hallucination (i.e., correctness of facts with respect to a model's internal knowledge) in long-form settings.

Long-form factuality in large language models (2403.18802 - Wei et al., 27 Mar 2024) in Conclusion (sec:conclusion)