SHROOM-INDElab at SemEval-2024 Task 6: Zero- and Few-Shot LLM-Based Classification for Hallucination Detection (2404.03732v1)
Abstract: We describe the University of Amsterdam Intelligent Data Engineering Lab team's entry for the SemEval-2024 Task 6 competition. The SHROOM-INDElab system builds on previous work on using prompt programming and in-context learning with LLMs to build classifiers for hallucination detection, and extends that work through the incorporation of context-specific definition of task, role, and target concept, and automated generation of examples for use in a few-shot prompting approach. The resulting system achieved fourth-best and sixth-best performance in the model-agnostic track and model-aware tracks for Task 6, respectively, and evaluation using the validation sets showed that the system's classification decisions were consistent with those of the crowd-sourced human labellers. We further found that a zero-shot approach provided better accuracy than a few-shot approach using automatically generated examples. Code for the system described in this paper is available on Github.
- A learning algorithm for boltzmann machines. Cognitive Science, 9(1):147–169.
- Bradley P Allen. 2023. Conceptual engineering using large language models. arXiv preprint arXiv:2312.03749.
- Knowledge Engineering Using Large Language Models. Transactions on Graph Data and Knowledge, 1(1):3:1–3:19.
- Robert Friel and Atindriyo Sanyal. 2023. Chainpoll: A high efficacy method for llm hallucination detection. arXiv preprint arXiv:2310.18344.
- A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. arXiv preprint arXiv:2311.05232.
- Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38.
- Large language models are zero-shot reasoners. Advances in Neural Information Processing Systems, 35:22199–22213.
- Better zero-shot reasoning with role-play prompting. arXiv preprint arXiv:2308.07702.
- Leveraging large language models for nlg evaluation: A survey. arXiv preprint arXiv:2401.07103.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Comput. Surv., 55(9).
- Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. arXiv preprint arXiv:2303.08896.
- Testing prompt engineering methods for knowledge extraction from text. Semantic Web. Under Review.
- Role play with large language models. Nature, pages 1–6.
- Better zero-shot reasoning with self-adaptive prompting. arXiv preprint arXiv:2305.14106.
- Universal self-adaptive prompting. arXiv preprint arXiv:2305.14926.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.