Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 99 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 40 tok/s
GPT-5 High 38 tok/s Pro
GPT-4o 101 tok/s
GPT OSS 120B 470 tok/s Pro
Kimi K2 161 tok/s Pro
2000 character limit reached

HALO: An Ontology for Representing and Categorizing Hallucinations in Large Language Models (2312.05209v2)

Published 8 Dec 2023 in cs.AI and cs.CL

Abstract: Recent progress in generative AI, including LLMs like ChatGPT, has opened up significant opportunities in fields ranging from natural language processing to knowledge discovery and data mining. However, there is also a growing awareness that the models can be prone to problems such as making information up or `hallucinations', and faulty reasoning on seemingly simple problems. Because of the popularity of models like ChatGPT, both academic scholars and citizen scientists have documented hallucinations of several different types and severity. Despite this body of work, a formal model for describing and representing these hallucinations (with relevant meta-data) at a fine-grained level, is still lacking. In this paper, we address this gap by presenting the Hallucination Ontology or HALO, a formal, extensible ontology written in OWL that currently offers support for six different types of hallucinations known to arise in LLMs, along with support for provenance and experimental metadata. We also collect and publish a dataset containing hallucinations that we inductively gathered across multiple independent Web sources, and show that HALO can be successfully used to model this dataset and answer competency questions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Friend of a friend with benefits ontology (foaf+): extending a social network ontology for public health. BMC Medical Informatics and Decision Making, 20(10):1–14, 2020.
  2. The international data spaces information model–an ontology for sovereign exchange of digital content. In International Semantic Web Conference, pages 176–192. Springer, 2020.
  3. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity, 2023.
  4. D. Berrueta and J. Phipps. Best practice recipes for publishing rdf vocabularies, 08 2008.
  5. Sparks of artificial general intelligence: Early experiments with gpt-4, 2023.
  6. Mireot: the minimum information to reference an external ontology term. Applied Ontology, 6:23–33, 01 2011.
  7. Chatlaw: Open-source legal large language model with integrated external knowledge bases, 2023.
  8. An upper ontology for modern science branches and related entities. In The Semantic Web: 20th International Conference, ESWC 2023, Hersonissos, Crete, Greece, May 28–June 1, 2023, Proceedings, page 436–453, Berlin, Heidelberg, 2023. Springer-Verlag.
  9. Towards an ontology for urban tourism. In Proceedings of the 36th annual ACM symposium on applied computing, pages 1887–1890, 2021.
  10. D. Garijo. Widoco: A wizard for documenting ontologies. In C. d’Amato, M. Fernandez, V. Tamma, F. Lecue, P. Cudré-Mauroux, J. Sequeda, C. Lange, and J. Heflin, editors, The Semantic Web – ISWC 2017, pages 94–102, Cham, 2017. Springer International Publishing.
  11. D. Garijo and M. Osorio. Oba: An ontology-based framework for creating rest apis for knowledge graphs. In The Semantic Web–ISWC 2020: 19th International Semantic Web Conference, Athens, Greece, November 2–6, 2020, Proceedings, Part II 19, pages 48–64. Springer, 2020.
  12. Hallucinations in large multilingual translation models, 2023.
  13. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions, 2023.
  14. Dehallucinating large language models using formal methods guided iterative prompting. In 2023 IEEE International Conference on Assured Autonomy (ICAA), pages 149–152, 2023.
  15. Survey of hallucination in natural language generation. ACM Comput. Surv., 55(12), mar 2023.
  16. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension, 2019.
  17. An ontology for the materials design domain. In International Semantic Web Conference, pages 212–227. Springer, 2020.
  18. Halueval: A large-scale hallucination evaluation benchmark for large language models, 2023.
  19. Competition-level code generation with alphacode. Science, 378(6624):1092–1097, 2022.
  20. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models, 2023.
  21. M. A. Musen. The protégé project: A look back and a look forward. AI Matters, 1(4):4–12, jun 2015.
  22. Self-contradictory hallucinations of large language models: Evaluation, detection and mitigation, 2023.
  23. OpenAI. Gpt-4 technical report, 2023.
  24. OOPS! (OntOlogy Pitfall Scanner!): An On-line Tool for Ontology Evaluation. International Journal on Semantic Web and Information Systems (IJSWIS), 10(2):7–34, 2014.
  25. Lot: An industrial oriented ontology engineering framework. Engineering Applications of Artificial Intelligence, 111:104755, 2022.
  26. Zero-shot text-to-image generation, 2021.
  27. A concise ontology to support research on complex, multimodal clinical reasoning. In The Semantic Web: 20th International Conference, ESWC 2023, Hersonissos, Crete, Greece, May 28–June 1, 2023, Proceedings, page 390–407, Berlin, Heidelberg, 2023. Springer-Verlag.
  28. A survey of hallucination in large foundation models, 2023.
  29. On accurate evaluation of gans for language generation, 2019.
  30. Towards expert-level medical question answering with large language models, 2023.
  31. Lamda: Language models for dialog applications, 2022.
  32. A stitch in time saves nine: Detecting and mitigating hallucinations of llms by validating low-confidence generation, 2023.
  33. Cognitive mirage: A review of hallucinations in large language models, 2023.
  34. Biobart: Pretraining and evaluation of a biomedical generative language model, 2022.
  35. Lion: Latent point diffusion models for 3d shape generation, 2022.
  36. How language model hallucinations can snowball, 2023.
  37. Siren’s song in the ai ocean: A survey on hallucination in large language models, 2023.
Citations (2)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper presents HALO, a structured ontology that standardizes the categorization of hallucinations in large language models.
  • It divides the framework into two modules—hallucination and metadata—to capture diverse error types and essential experimental details.
  • Validation with real-world datasets shows HALO’s effectiveness in modeling hallucination instances and enabling comparative analysis across AI systems.

Introduction

The proliferation of generative AI systems, particularly LLMs, has led to remarkable breakthroughs in numerous applications. However, alongside these advances, a critical issue has emerged: these models can exhibit faulty reasoning or generate fabricated information, a phenomenon often referred to as "hallucinations". The paper in discussion introduces the Hallucination Ontology (HALO), a formal framework designed to represent and categorize hallucination instances in generative models, providing researchers with a standardized tool for systematic analysis and documentation.

Hallucination Challenges in AI

Hallucinations in AI raise substantial concerns, such as the potential for misinformation and the risk of undue reliance on the output of these systems. Despite widespread documentation of hallucinations across various platforms and studies, there has been a lack of formalized vocabulary or ontology to describe these occurrences systematically. This absence hinders empirical research and analysis, as data on hallucinations are often scattered and inconsistently described.

The HALO ontology steps into this gap, offering a structured, open license model in OWL format, adhering to FAIR principles. The ontology supports six known types of hallucinations, with the flexibility to expand as new hallucination patterns emerge. The model emphasizes extensibility and the inclusion of meta-data, such as provenance and experimental detail, facilitating comparisons between different models and hallucination instances.

HALO's Design and Features

HALO consists of two primary modules, the Hallucination Module, and the Metadata Module, which separate the diverse categories of hallucinations from the more standard concepts suited for capturing experimental data. This division allows the ontology to adapt and expand as further research uncovers additional categories or subtypes of hallucinations.

The ontology connects hallucination instances to external classes and aligns with the latest findings in AI research, aiming for a broad scope and interoperability with published vocabularies. Additionally, HALO supports metadata representation, crucial for cross-analysis and understanding the context of each hallucination, including details such as the specific LLM that generated the error, the date of occurrence, and the source of detection.

Evaluation and Implications

Using a dataset compiled from various web sources, HALO was tested for its ability to model and answer complex competency questions related to hallucinations in LLMs. The results verified that HALO is well-equipped to model the dataset successfully and could answer these questions accurately, demonstrating its practical utility.

The development of HALO is a step towards comprehensively understanding and mitigating hallucinations in AI, with the potential to inform future improvements in generative models. By enabling standardized documentation and analysis, researchers can systematically paper hallucinations and contribute to the ongoing refinement of AI systems.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.