2000 character limit reached
Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering (2308.16622v1)
Published 31 Aug 2023 in cs.AI, cs.CL, and cs.DB
Abstract: As the field of LLMs evolves at an accelerated pace, the critical need to assess and monitor their performance emerges. We introduce a benchmarking framework focused on knowledge graph engineering (KGE) accompanied by three challenges addressing syntax and error correction, facts extraction and dataset generation. We show that while being a useful tool, LLMs are yet unfit to assist in knowledge graph generation with zero-shot prompting. Consequently, our LLM-KG-Bench framework provides automatic evaluation and storage of LLM responses as well as statistical data and visualization tools to support tracking of prompt engineering and model performance.
- Lars-Peter Meyer (7 papers)
- Johannes Frey (9 papers)
- Kurt Junghanns (3 papers)
- Felix Brei (5 papers)
- Kirill Bulert (3 papers)
- Sabine GrĂ¼nder-Fahrer (1 paper)
- Michael Martin (12 papers)