A Benchmark to Understand the Role of Knowledge Graphs on Large Language Model's Accuracy for Question Answering on Enterprise SQL Databases (2311.07509v1)

Published 13 Nov 2023 in cs.AI, cs.CL, and cs.DB

Abstract: Enterprise applications of LLMs hold promise for question answering on enterprise SQL databases. However, the extent to which LLMs can accurately respond to enterprise questions in such databases remains unclear, given the absence of suitable Text-to-SQL benchmarks tailored to enterprise settings. Additionally, the potential of Knowledge Graphs (KGs) to enhance LLM-based question answering by providing business context is not well understood. This study aims to evaluate the accuracy of LLM-powered question answering systems in the context of enterprise questions and SQL databases, while also exploring the role of knowledge graphs in improving accuracy. To achieve this, we introduce a benchmark comprising an enterprise SQL schema in the insurance domain, a range of enterprise queries encompassing reporting to metrics, and a contextual layer incorporating an ontology and mappings that define a knowledge graph. Our primary finding reveals that question answering using GPT-4, with zero-shot prompts directly on SQL databases, achieves an accuracy of 16%. Notably, this accuracy increases to 54% when questions are posed over a Knowledge Graph representation of the enterprise SQL database. Therefore, investing in Knowledge Graph provides higher accuracy for LLM powered question answering systems.

PDF Abstract

The Role of Knowledge Graphs in Enhancing LLM Accuracy for SQL-Based Question Answering Systems

The paper presents a critical examination of the accuracy of LLMs for question answering within enterprise SQL databases, emphasizing the incorporation of Knowledge Graphs (KGs) as a mechanism to enhance this accuracy. The research delineates the parameters under which LLMs, particularly GPT-4 in this instance, perform in the context of complex enterprise environments, thereby filling a noteworthy gap in current literature related to Text-to-SQL translation.

The authors highlight the shortcomings of existing benchmarks which often disregard real-world intricacies such as complex database schemas and the lack of a business context layer, which is pivotal for accurate and explainable data retrieval in enterprise settings. Their benchmark provides a comprehensive framework consisting of an enterprise SQL schema from the insurance domain, a curated set of enterprise queries that vary in complexity, and an enriched context layer incorporating an ontology and mappings that form a KG.

Key Findings and Implications

The primary findings demonstrate a marked improvement in accuracy when employing KGs. Specifically, LLMs achieve only 16.7% accuracy with zero-shot prompts directed towards SQL databases. However, this accuracy significantly improves to 54.2% with the use of a KG representation of the SQL data, illustrating a substantial enhancement of 37.5%. Such improvements were particularly pronounced in high-schema complexity scenarios, where traditional SQL-based LLM approaches failed entirely.

The implications of these findings are manifold. Firstly, they substantiate the hypothesis that KGs provide essential business context and semantics, thereby enhancing the capacity of LLMs to generate more accurate and reliable results. This suggests that organizations should consider investing in KG infrastructure to augment the performance of LLM-based systems, particularly for complex, multi-table queries, where context and relationships are not explicitly captured by SQL alone.

Theoretical and Practical Implications

Theoretically, this paper reinforces the utility of KGs in addressing hallucinations and inaccuracies that arise from insufficient context. The integration of ontology and semantic mappings offers a structured approach to aligning enterprise data with LLM capabilities. Practically, the deployment of KGs ensures more explainable and trustworthy interactions with enterprise data, which could facilitate greater adoption of LLM-based question answering systems across various industry domains.

Future Directions

This work opens several avenues for further research. Enhancements to the benchmark itself, including broader domain applications and expanded question repositories, could provide deeper insights into specific areas of improvement for LLMs. Additionally, exploring the explainability of such systems and the partial accuracy of responses could refine their application and user satisfaction. Furthermore, addressing cost implications and optimizing resources in the practical implementation of these systems in enterprises remains an inviting prospect.

By fostering a robust dialogue between the fields of data management, knowledge graphs, and AI, the paper underscores the transformative potential of hybrid systems that leverage the strengths of both LLMs and KGs. As enterprises navigate the complexities of AI deployments, the insights from this research provide a valuable framework for enhancing the accuracy and reliability of question answering systems, paving the way for more informed data-driven decision-making processes.