Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Increasing the LLM Accuracy for Question Answering: Ontologies to the Rescue! (2405.11706v1)

Published 20 May 2024 in cs.AI, cs.DB, cs.IR, and cs.LO

Abstract: There is increasing evidence that question-answering (QA) systems with LLMs, which employ a knowledge graph/semantic representation of an enterprise SQL database (i.e. Text-to-SPARQL), achieve higher accuracy compared to systems that answer questions directly on SQL databases (i.e. Text-to-SQL). Our previous benchmark research showed that by using a knowledge graph, the accuracy improved from 16% to 54%. The question remains: how can we further improve the accuracy and reduce the error rate? Building on the observations of our previous research where the inaccurate LLM-generated SPARQL queries followed incorrect paths, we present an approach that consists of 1) Ontology-based Query Check (OBQC): detects errors by leveraging the ontology of the knowledge graph to check if the LLM-generated SPARQL query matches the semantic of ontology and 2) LLM Repair: use the error explanations with an LLM to repair the SPARQL query. Using the chat with the data benchmark, our primary finding is that our approach increases the overall accuracy to 72% including an additional 8% of "I don't know" unknown results. Thus, the overall error rate is 20%. These results provide further evidence that investing knowledge graphs, namely the ontology, provides higher accuracy for LLM powered question answering systems.

Improving LLMs for Question-Answering on SQL Databases with Knowledge Graphs

Introduction

Alright, fellow data enthusiasts, buckle up! Today, we're diving into a fascinating approach to making LLMs even smarter when it comes to answering questions based on SQL databases. The trick? Using Knowledge Graphs with some clever ontology-based error detection and repair strategies. Let's break it down.

The Problem: LLMs and SQL Accuracy

Imagine you're a business user with access to a vast SQL database, and you'd like to ask natural language questions and get accurate responses. LLMs, like GPT-4, can help with this by converting those natural language questions into SQL queries. However, they often hit accuracy roadblocks.

In prior research, directly querying SQL databases with LLMs (Text-to-SQL) only yielded around 16% accuracy. This improved to 54% when using a knowledge graph to represent the SQL database (Text-to-SPARQL). Clearly, knowledge graphs boost performance, but we’re still left wondering: how do we push this even further?

The New Approach: Error Checking and Repairing

The new method works in two main ways:

  1. Ontology-based Query Check (OBQC): This system leverages the ontology of the knowledge graph to check if the LLM-generated SPARQL queries are semantically correct.
  2. LLM Repair: This uses error explanations from the OBQC to help the LLM repair incorrect queries.

Ontology-based Query Check (OBQC)

Wondering how this works under the hood? Let's break it down.

  1. Understanding BGPs: LLMs generate Basic Graph Patterns (BGPs) in their SPARQL queries. OBQC extracts these patterns and compares them against the ontology.
  2. Rule-Based Error Detection: OBQC has rules for checking different parts of the query. For example:
    • Domain Rule: Ensures that the subject of a property belongs to a specific class.
    • Range Rule: Ensures that the object of a property belongs to a specific class.
    • Double Domain/Range Rules: Check for conflicts when multiple properties target the same subject or object.
    • SELECT Clause Checks: Ensure that the query returns human-readable results rather than raw IRIs.

LLM Repair: Fixing the Queries

If OBQC finds an error, it provides a textual explanation which is then fed back to the LLM. The LLM uses this feedback to rewrite the query, and this cycle continues until the query passes the checks or a maximum number of attempts is reached. If the query can't be fixed, the result is marked as "unknown."

Experimental Results: Boosting Accuracy

The paper reports impressive results using this approach. Here's a quick rundown:

  • Overall Accuracy: Jumped to 72.55% with repairs, including an 8% “I don't know” rate, reducing the overall error rate to 20%.
  • Low Question/Low Schema Complexity: Error rate dropped to 10.46%.
  • High Question/High Schema Complexity: Error rate decreased by a significant margin, although it still remained higher than simpler setups.

Implications and Future Development

This research strongly suggests that investing in knowledge graphs and ontologies is crucial for enhancing the accuracy of LLM-powered question-answering systems.

  1. Practical Side: More accurate query responses mean businesses can trust their chat-with-data experiences more. Imagine asking complex business questions and getting precise, explainable answers!
  2. Theoretical Advancements: These results highlight the importance of semantics and structured metadata. There's also a fascinating insight that domain errors (left side issues) are more common, shedding light on how LLMs process natural language into query language.

Final Thoughts

This approach provides a robust framework for tackling errors in LLM-generated SPARQL queries, pushing the boundaries of current AI capabilities in business contexts. Future work might delve into more complex ontologies and rule sets, but the foundation laid here is promising.

So, next time you're chatting with your data, know that there's a whole world of semantics working behind the scenes to make sure your answers are spot-on!

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. Repair checking in inconsistent databases: algorithms and complexity. In Proceedings of the 12th International Conference on Database Theory (New York, NY, USA, 2009), ICDT ’09, Association for Computing Machinery, p. 31–41.
  2. Semantic Web for the Working Ontologist: Effective Modeling for Linked Data, RDFS, and OWL, 3 ed., vol. 33. Association for Computing Machinery, New York, NY, USA, 2020.
  3. Repairagent: An autonomous, llm-based agent for program repair, 2024.
  4. Expanding the scope of the ATIS task: The ATIS-3 corpus. Proceedings of the workshop on Human Language Technology (1994), 43–48.
  5. Automated program repair. Commun. ACM 62, 12 (nov 2019), 56–65.
  6. Baseball: An automatic question-answerer. In Papers Presented at the May 9-11, 1961, Western Joint IRE-AIEE-ACM Computer Conference (New York, NY, USA, 1961), IRE-AIEE-ACM ’61 (Western), Association for Computing Machinery, p. 219–224.
  7. The use of theorem-proving techniques in question-answering systems. In Proceedings of the 1968 23rd ACM National Conference (New York, NY, USA, 1968), ACM ’68, Association for Computing Machinery, p. 169–181.
  8. Developing a natural language interface to complex data. ACM Trans. Database Syst. 3, 2 (jun 1978), 105–147.
  9. Knowledge Graphs. Synthesis Lectures on Data, Semantics, and Knowledge. Morgan & Claypool Publishers, 2021.
  10. Knowledge graphs. ACM Comput. Surv. 54, 4 (2022), 71:1–71:37.
  11. Inferfix: End-to-end program repair with llms. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (New York, NY, USA, 2023), ESEC/FSE 2023, Association for Computing Machinery, p. 1646–1656.
  12. A benchmark to understand the role of knowledge graphs on large language model’s accuracy for question answering on enterprise sql databases, 2023.
  13. Automated construction of database interfaces: Intergrating statistical and relational learning for semantic parsing. In 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (2000), pp. 133–141.
  14. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (2018), pp. 3911–3921.
  15. Woods, W. A. Transition network grammars for natural language analysis. Commun. ACM 13, 10 (oct 1970), 591–606.
  16. Corrective retrieval augmented generation, 2024.
  17. Learning to parse database queries using inductive logic programming. In Proceedings of the Thirteenth National Conference on Artificial Intelligence - Volume 2 (1996), pp. 1050–1055.
  18. A survey of learning-based automated program repair. ACM Trans. Softw. Eng. Methodol. 33, 2 (dec 2023).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Dean Allemang (2 papers)
  2. Juan Sequeda (7 papers)
Citations (10)