Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 49 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 19 tok/s Pro
GPT-5 High 16 tok/s Pro
GPT-4o 103 tok/s Pro
Kimi K2 172 tok/s Pro
GPT OSS 120B 472 tok/s Pro
Claude Sonnet 4 39 tok/s Pro
2000 character limit reached

CypherBench: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM Era (2412.18702v2)

Published 24 Dec 2024 in cs.CL, cs.AI, and cs.DB

Abstract: Retrieval from graph data is crucial for augmenting LLMs (LLM) with both open-domain knowledge and private enterprise data, and it is also a key component in the recent GraphRAG system (edge et al., 2024). Despite decades of research on knowledge graphs and knowledge base question answering, leading LLM frameworks (e.g. Langchain and LlamaIndex) have only minimal support for retrieval from modern encyclopedic knowledge graphs like Wikidata. In this paper, we analyze the root cause and suggest that modern RDF knowledge graphs (e.g. Wikidata, Freebase) are less efficient for LLMs due to overly large schemas that far exceed the typical LLM context window, use of resource identifiers, overlapping relation types and lack of normalization. As a solution, we propose property graph views on top of the underlying RDF graph that can be efficiently queried by LLMs using Cypher. We instantiated this idea on Wikidata and introduced CypherBench, the first benchmark with 11 large-scale, multi-domain property graphs with 7.8 million entities and over 10,000 questions. To achieve this, we tackled several key challenges, including developing an RDF-to-property graph conversion engine, creating a systematic pipeline for text-to-Cypher task generation, and designing new evaluation metrics.

Summary

  • The paper introduces CypherBench, a benchmark enabling precise LLM-based retrieval from large modern knowledge graphs by addressing RDF inefficiencies using property graph views and the Cypher language.
  • A key contribution is an RDF-to-property graph conversion engine that transforms RDF data into clean, schema-enriched property graphs for efficient LLM querying.
  • Experimental findings show that even state-of-the-art LLMs like Claude 3.5 achieve around 61.58% accuracy on CypherBench, revealing the complexity of scaling retrieval over full-scale KGs.

An Overview of CypherBench for Modern Knowledge Graph Retrieval in the LLM Era

The paper introduces CypherBench, a benchmark designed to facilitate efficient and precise retrieval over modern knowledge graphs (KGs) in the era of LLMs. Despite significant advancements in the integration of KGs with LLMs, current systems such as Langchain and LlamaIndex offer minimal support for retrieval from expansive encyclopedic KGs like Wikidata. This is primarily due to the inefficiencies posed by the Resource Description Framework (RDF) used in these knowledge graphs, which includes large schemas that exceed typical LLM context windows, ambiguous resource identifiers, and a lack of data normalization.

Proposed Solution: Property Graph Views

To mitigate the aforementioned challenges, the paper proposes using property graph views that allow for efficient querying by LLMs through the Cypher query language. The authors implemented this solution on Wikidata, culminating in the creation of CypherBench — the first benchmark comprising 11 large-scale, multi-domain property graphs totaling 7.8 million entities. The benchmark includes over 10,000 questions spanning diverse KGs, enforcing consistency across distinct graphs with an RDF-to-property graph transformation engine.

Technical Contribution and Methodology

1. RDF-to-Property Graph Conversion:

A core innovation is the RDF-to-property graph conversion engine, enabling the transformation of RDF triples into property graphs efficiently using SPARQL queries. The engine performs crucial functions including datatype conversion and unit standardization to create clean, schema-enriched property graphs that are conducive to LLM operations.

2. Task Generation Pipeline:

The authors developed a systematic pipeline for generating text-to-Cypher tasks. This pipeline involves creating templates for generating initial (question, Cypher) pairs with a variety of graph patterns, followed by refinement through LLM-generated natural language questions. This aims to produce realistic and semantically diverse tasks.

3. Evaluation Metrics:

To assess retrieval accuracy and performance, the authors employ two primary metrics: execution accuracy (EX) and Provenance Subgraph Jaccard Similarity (PSJS). While EX measures whether the generated Cypher retrieves results matching the ground truth, PSJS provides a similarity index between the provenance subgraphs affected by the predicted and true queries. Both metrics are crucial for detailed evaluation in practical settings.

Experimental Findings

The benchmark tests the capabilities of state-of-the-art LLMs, with varying performance across models. Proprietary models, like Claude3.5, achieve execution accuracy of around 61.58%, indicating challenges in scaling CypherBench. This reveals that smaller LLMs, at <10B parameters, struggle with accurate graph matching and query execution, showcasing the significant complexity of the benchmark.

Implications and Future Directions

CypherBench paves the way for advancing research in KG-based retrieval systems by highlighting challenges in precise graph querying and the potential of Cypher as a unified interface for databases. The proposed methodologies provide a blueprint for integrating full-scale modern KGs with LLM architectures, addressing the scalability issues observed in RDF-based frameworks.

Looking forward, potential developments include enhancing entity linking mechanisms, improving Cypher generation accuracy via fine-tuning of LLMs, and further exploring property graph views' potential. This work significantly contributes to understanding KG dynamics in the LLM landscape, offering strong foundations for future research initiatives aiming for efficient and accurate retrieval from extensive knowledge repositories.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Reddit Logo Streamline Icon: https://streamlinehq.com