Papers
Topics
Authors
Recent
Search
2000 character limit reached

X-SQL: Expert Schema Linking and Understanding of Text-to-SQL with Multi-LLMs

Published 7 Sep 2025 in cs.LG and cs.DB | (2509.05899v1)

Abstract: With LLMs' (LLMs) emergent abilities on code generation tasks, Text-to-SQL has become one of the most popular downstream applications. Despite the strong results of multiple recent LLM-based Text-to-SQL frameworks, the research community often overlooks the importance of database schema information for generating high-quality SQL queries. We find that such schema information plays a significant or even dominant role in the Text-to-SQL task. To tackle this challenge, we propose a novel database schema expert with two components. We first introduce X-Linking, an LLM Supervised Finetuning (SFT)-based method that achieves superior Schema Linking results compared to existing open-source Text-to-SQL methods. In addition, we innovatively propose an X-Admin component that focuses on Schema Understanding by bridging the gap between abstract schema information and the user's natural language question. Aside from better learning with schema information, we experiment with Multi-LLMs for different components within the system to further boost its performance. By incorporating these techniques into our end-to-end framework, X-SQL, we have achieved Execution Accuracies of 84.9% on the Spider-Dev dataset and 82.5% on the Spider-Test dataset. This outstanding performance establishes X-SQL as the leading Text-to-SQL framework based on open-source models.

Authors (1)

Summary

  • The paper introduces an advanced Text-to-SQL framework that leverages expert schema linking and schema understanding for accurate SQL generation.
  • It uses supervised fine-tuning to improve schema linking accuracy, achieving up to 7% gains over traditional methods.
  • The multi-LLMs strategy integrates tailored LLMs to address self-bias and enhances performance on the Spider dataset benchmarks.

X-SQL: Expert Schema Linking and Understanding of Text-to-SQL with Multi-LLMs

The paper "X-SQL: Expert Schema Linking and Understanding of Text-to-SQL with Multi-LLMs" introduces a novel approach to the Text-to-SQL task using LLMs. X-SQL addresses significant gaps in the utilization of database schema information which other approaches have often overlooked.

Introduction and Background

Text-to-SQL tasks involve generating SQL code directly from natural language queries, making substantial use of emergent abilities of LLMs. However, the success of this task heavily relies on accurately understanding and utilizing database schemas. The challenge arises from the complexity of databases, where schemas can contain numerous tables, posing a barrier to efficient LLM-based solutions due to resource consumption and potential for generating erroneous queries.

X-SQL Framework Overview

The X-SQL framework consists of three main components: X-Linking for Schema Linking, X-Admin for Schema Understanding, and a SQL Generation and Debugging component.

  1. X-Linking (Schema Linking): This component uses Supervised Fine-Tuning (SFT) on LLMs to improve their ability to link relevant database tables with natural language queries. X-Linking addresses the conventional oversight that LLMs do not naturally excel in schema linking due to their pre-training limitations.
  2. X-Admin (Schema Understanding): This component translates abstract database schema definitions into natural language to bridge the gap between the database's structural information and the user's natural language queries. This process serves a function akin to a human data administrator, enhancing the overall contextual understanding of the system.
  3. Multi-LLMs Strategy: X-SQL leverages multiple LLMs tailored for different tasks within the system, thereby improving performance. This approach addresses the self-bias issue in LLMs by utilizing diverse models for tasks like debugging. Figure 1

    Figure 1: X-SQL's architecture. The candidate database schema is first filtered by X-linking. After that, X-Admin adds natural language descriptions to the linked table schema. Finally, we generate SQL queries with all this information and attempt to fix the queries if they execute with errors. The LLMs setup is based on the best Spider-Test result.

Empirical Evaluation

X-SQL demonstrates substantial performance improvements over existing frameworks on the Spider dataset, which is a benchmark for Text-to-SQL tasks. On the Spider-Dev dataset, X-SQL achieves an execution accuracy of 84.9%, surpassing the previous state-of-the-art. On the Spider-Test dataset, X-SQL achieves 82.5%.

Impact of X-Linking

X-Linking significantly improves the schema linking task. Compared to existing schema linking modules, X-Linking's accuracy gains are 7% higher, showcasing the efficacy of its dedicated SFT approach. This improvement is crucial in simplifying schema inputs and reducing errors in SQL generation.

Role of X-Admin

X-Admin contributes an additional 1.7% higher performance, emphasizing the importance of schema understanding. By converting technical schema descriptions into natural language, X-Admin improves LLM's ability to generate accurate SQL queries by providing comprehensive schema context.

Multi-LLMs Implementation

The integration of Multi-LLMs within X-SQL not only achieves better results but does so by exploiting the strengths of diverse LLM architectures. This multi-agent approach enables improved collaboration among various system components, paving the way for advancements in complex AI systems that require varied expertise. Figure 2

Figure 2: X-Admin (Schema Understanding) Prompt.

Conclusions and Future Work

X-SQL sets a new standard for open-source Text-to-SQL systems by effectively addressing schema linking and understanding challenges. Its use of SFT-trained schema linking and schema understanding, alongside a Multi-LLM strategy, significantly enhances system performance. Future research could explore extending the applicability of this framework to other code generation tasks or complex database scenarios, potentially further improving the robustness and adaptability of Text-to-SQL systems.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 2 likes about this paper.