Increasing the LLM Accuracy for Question Answering: Ontologies to the Rescue! (2405.11706v1)

Published 20 May 2024 in cs.AI, cs.DB, cs.IR, and cs.LO

Abstract: There is increasing evidence that question-answering (QA) systems with LLMs, which employ a knowledge graph/semantic representation of an enterprise SQL database (i.e. Text-to-SPARQL), achieve higher accuracy compared to systems that answer questions directly on SQL databases (i.e. Text-to-SQL). Our previous benchmark research showed that by using a knowledge graph, the accuracy improved from 16% to 54%. The question remains: how can we further improve the accuracy and reduce the error rate? Building on the observations of our previous research where the inaccurate LLM-generated SPARQL queries followed incorrect paths, we present an approach that consists of 1) Ontology-based Query Check (OBQC): detects errors by leveraging the ontology of the knowledge graph to check if the LLM-generated SPARQL query matches the semantic of ontology and 2) LLM Repair: use the error explanations with an LLM to repair the SPARQL query. Using the chat with the data benchmark, our primary finding is that our approach increases the overall accuracy to 72% including an additional 8% of "I don't know" unknown results. Thus, the overall error rate is 20%. These results provide further evidence that investing knowledge graphs, namely the ontology, provides higher accuracy for LLM powered question answering systems.

PDF HTML Abstract

Improving LLMs for Question-Answering on SQL Databases with Knowledge Graphs

Introduction

Alright, fellow data enthusiasts, buckle up! Today, we're diving into a fascinating approach to making LLMs even smarter when it comes to answering questions based on SQL databases. The trick? Using Knowledge Graphs with some clever ontology-based error detection and repair strategies. Let's break it down.

The Problem: LLMs and SQL Accuracy

Imagine you're a business user with access to a vast SQL database, and you'd like to ask natural language questions and get accurate responses. LLMs, like GPT-4, can help with this by converting those natural language questions into SQL queries. However, they often hit accuracy roadblocks.

In prior research, directly querying SQL databases with LLMs (Text-to-SQL) only yielded around 16% accuracy. This improved to 54% when using a knowledge graph to represent the SQL database (Text-to-SPARQL). Clearly, knowledge graphs boost performance, but we’re still left wondering: how do we push this even further?

The New Approach: Error Checking and Repairing

The new method works in two main ways:

Ontology-based Query Check (OBQC): This system leverages the ontology of the knowledge graph to check if the LLM-generated SPARQL queries are semantically correct.
LLM Repair: This uses error explanations from the OBQC to help the LLM repair incorrect queries.

Ontology-based Query Check (OBQC)

Wondering how this works under the hood? Let's break it down.

Understanding BGPs: LLMs generate Basic Graph Patterns (BGPs) in their SPARQL queries. OBQC extracts these patterns and compares them against the ontology.
Rule-Based Error Detection: OBQC has rules for checking different parts of the query. For example:
- Domain Rule: Ensures that the subject of a property belongs to a specific class.
- Range Rule: Ensures that the object of a property belongs to a specific class.
- Double Domain/Range Rules: Check for conflicts when multiple properties target the same subject or object.
- SELECT Clause Checks: Ensure that the query returns human-readable results rather than raw IRIs.

LLM Repair: Fixing the Queries

If OBQC finds an error, it provides a textual explanation which is then fed back to the LLM. The LLM uses this feedback to rewrite the query, and this cycle continues until the query passes the checks or a maximum number of attempts is reached. If the query can't be fixed, the result is marked as "unknown."

Experimental Results: Boosting Accuracy

The paper reports impressive results using this approach. Here's a quick rundown:

Overall Accuracy: Jumped to 72.55% with repairs, including an 8% “I don't know” rate, reducing the overall error rate to 20%.
Low Question/Low Schema Complexity: Error rate dropped to 10.46%.
High Question/High Schema Complexity: Error rate decreased by a significant margin, although it still remained higher than simpler setups.

Implications and Future Development

This research strongly suggests that investing in knowledge graphs and ontologies is crucial for enhancing the accuracy of LLM-powered question-answering systems.

Practical Side: More accurate query responses mean businesses can trust their chat-with-data experiences more. Imagine asking complex business questions and getting precise, explainable answers!
Theoretical Advancements: These results highlight the importance of semantics and structured metadata. There's also a fascinating insight that domain errors (left side issues) are more common, shedding light on how LLMs process natural language into query language.

Final Thoughts

This approach provides a robust framework for tackling errors in LLM-generated SPARQL queries, pushing the boundaries of current AI capabilities in business contexts. Future work might delve into more complex ontologies and rule sets, but the foundation laid here is promising.

So, next time you're chatting with your data, know that there's a whole world of semantics working behind the scenes to make sure your answers are spot-on!