XiYan-SQL: Multi-Generator NL2SQL Framework

Updated 17 March 2026

The paper introduces XiYan-SQL, an end-to-end Text-to-SQL framework that uses a multi-generator ensemble to enhance SQL generation.
It employs schema filtering, diverse supervised fine-tuned and in-context SQL generators, and a self-refinement module to optimize query construction.
Empirical results demonstrate state-of-the-art performance on benchmarks like BIRD and Spider, showcasing improved execution accuracy and robustness.

XiYan-SQL is an end-to-end framework for Text-to-SQL (NL2SQL) translation that establishes a new paradigm via a multi-generator ensemble methodology. It pioneers the integrated use of schema filtering, multiple supervised-fine-tuned and in-context SQL generators, self-refinement modules, and a learned selection model. XiYan-SQL achieves state-of-the-art execution accuracy across competitive NL2SQL benchmarks, including 75.63% on BIRD and 89.65% on Spider, surpassing prior methods and demonstrating strong generalization (Gao et al., 2024, Liu et al., 7 Jul 2025).

1. System Architecture and Key Components

XiYan-SQL decomposes the NL2SQL process into distinct, specialized components, each responsible for addressing tractable subproblems in robust SQL semantic parsing.

Schema Filter: Filters the input database schema to derive multiple relevant schema subsets for each user query, reducing noise and computational overhead. Filtering uses LLM-based keyword extraction followed by embedding similarity to match columns/tables and iterative selection to balance precision and recall, resulting in a set $\{S_1,\dots,S_{p_s}\}$ of filtered schemas.
Multi-Generator Ensemble: For each filtered schema, an ensemble of $p_m$ SQL generators (Supervised Fine-Tuned and ICL-based) produces a diverse pool of candidate queries. Each generator is fine-tuned or prompted on different auxiliary tasks and SQL formats to encourage diversity as well as precision.
Self-Refiner Module: Executes each candidate SQL. In case of failure, the generator is re-prompted (or the candidate is re-generated) based on the exception feedback, correcting logical or syntactic errors such as missing joins or mismatched types.
Selection Model with Candidate Reorganization: SQL candidates are grouped by identical execution outputs. A candidate reorganization strategy (cluster by majority, then order by generator reliability and brevity) feeds into a lightweight fine-tuned LLM that ranks and selects the optimal candidate for final execution.

The model pipeline is depicted in the table below.

Component	Function	Techniques
Schema Filter	Relevant sub-schema extraction	LLM keyword extraction, embedding similarity
SQL Generators	Diverse candidate SQL generation	Multi-task SFT, ICL prompting, format variation
Refiner	Error correction via execution feedback	Self-refinement using exception traces
Selector	Final candidate ranking	Fine-tuned LLM, candidate reorganization

2. M-Schema: Semi-Structured Schema Representation

To enhance model awareness of intricate database structures, XiYan-SQL introduces the M-Schema representation.

Definition: For a database $S$ with tables $T = \{T_1,\dots,T_n\}$ , columns $C(T_i) = \{c_{i1},\dots,c_{im}\}$ , and foreign key relations $FK$ , each column $c_{ij}$ is a 5-tuple $(\mathit{name}, \mathit{dtype}, \mathit{desc}, \mathit{pk}, \mathit{examples})$ . M-Schema represents $S$ as a flat sequence enumerating tables, their columns, and FK relations:

$\text{M-Schema} = [\,\langle \text{DB\_ID, D} \rangle,\, \#T_1,\, \{\mathrm{columns}\}, \langle \text{Foreign Keys}, FK_1\rangle,\, \ldots,\, \#T_n,\, \{\}, \langle \text{Foreign Keys}, FK_n\rangle\,]$

Formalization:

$p_m$ 0

$p_m$ 1

Motivation: M-Schema enables both SFT and ICL-based models to access fine-grained schema context and relationships, empirically improving execution accuracy by up to +2.03% absolute over DDL or MAC-SQL schema representations (Gao et al., 2024).

3. Multi-Generator Candidate Construction

XiYan-SQL leverages an ensemble of SQL generators to optimize both quality and diversity in candidate space.

3.1 Supervised Fine-Tuned (SFT) Generators

Each SFT model undergoes a two-stage training pipeline:

Basic-Syntax Stage: Induces SQL syntax fluency using large dialect-agnostic corpora.
Generation-Enhance Stage: Multi-task training on:
- Question $p_m$ 2 SQL,
- SQL $p_m$ 3 Question reconstruction,
- Evidence selection,
- SQL discrimination/regeneration with execution feedback.

The objective for each task $p_m$ 4 ( $p_m$ 5) is:

$p_m$ 6

Overall model loss is $p_m$ 7. Multi-format enhancement is achieved by training on diversified SQL rewrites (chunked, standardized, mixed) to create models with distinct output styles.

3.2 In-Context Learning (ICL) Generator

The ICL-based generator (e.g., GPT-4o) selects exemplars using Named-Entity Masked Skeleton Similarity:

Named entities in the query are masked, embeddings are computed, and K-nearest exemplars are selected via cosine similarity.
If schema-linking yields $p_m$ 8 tables, only exemplars with multi-table joins are considered.

3.3 Refiner Module

For each candidate $p_m$ 9, if initial execution yields an error, the refiner LLM is prompted with $S$ 0 to repair the SQL. This self-refinement loop typically corrects execution-critical syntactic/logic errors.

4. Candidate Reorganization and Selection

Majority voting is suboptimal when correct solutions are "minority" in the candidate set. XiYan-SQL's selection process includes:

Candidate Reorganization: Candidates are clustered by execution result; clusters are sorted by size (consensus) and generator reliability, with within-cluster ordering favoring brevity or reliability. If no consensus majority, the shortest candidate from each cluster is prioritized.
Learned Selection Model: A lightweight LLM is fine-tuned to select among the reorganized candidate list $S$ 1:

$S$ 2

The model predicts the index $S$ 3, yielding the final SQL $S$ 4. The training objective is the standard cross-entropy on gold target selection.

Ablations confirm this module contributes 3–4% EX performance over naive majority-voting(Liu et al., 7 Jul 2025).

5. Empirical Performance and Benchmarking

Extensive benchmarking on BIRD, Spider, SQL-Eval, and NL2GQL demonstrate the efficacy and generalizability of XiYan-SQL.

Benchmark	XiYan-SQL EX (%)	Prior SOTA	Improvement
BIRD	75.63	74.79	+0.84 (vs. CHASE-SQL+Gemini)
Spider	89.65	89.60	+0.05 (vs. MCS-SQL+GPT-4)
SQL-Eval	69.86	67–68	+1-2 points
NL2GQL	41.20	<18	>23 points

XiYan-SQL further improves over closed-source LLMs by substantial margins in generalization/robustness tests on PostgreSQL and MySQL datasets (Liu et al., 7 Jul 2025).
Ablations demonstrate each module's necessity: removing multi-generator diversity, schema filter, or selection model drops EX by up to 4.04%, 1.24%, and 3.13%, respectively. The multi-generator oracle is significantly above achieved EX, indicating selection remains a bottleneck.

6. Error Analysis and Observed Strengths

Schema Filter reduces "no-table-found" errors by up to 30%.
Diverse Generators enable handling complex SQL constructs such as window functions and multi-way joins, capturing distributional tails.
Selection Model mitigates errors where high-confidence single model outputs are semantically incorrect (e.g., improper GROUP BY or filter conditions).
Refiner repairs minor but execution-critical SQL errors that would otherwise cause output elimination.

Collectively, these factors yield state-of-the-art and robust performance across a wide spectrum of database schema and query complexities.

7. Open Problems and Prospective Directions

XiYan-SQL demonstrates strong generalization and efficiency, yet limitations remain:

Scaling filter/generation stages to more schema subsets ( $S$ 5) and generator models ( $S$ 6) may further increase oracle upper bounds.
Incorporating richer execution feedback, such as query plan costs, could enhance the refiner's correction capabilities.
Integrating the multi-stage pipeline into a unified, multi-task, multi-format curriculum-trained "all-in-one" model is an avenue for future research (Gao et al., 2024, Liu et al., 7 Jul 2025).

XiYan-SQL exemplifies a modular, interpretable, and extensible architecture for complex semantic parsing, establishing a new reference framework for industrial NL2SQL deployments and further academic exploration.

Markdown Report Issue Upgrade to Chat

References (2)

A Preview of XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL (2024)

XiYan-SQL: A Novel Multi-Generator Framework For Text-to-SQL (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to XiYan-SQL.

XiYan-SQL: Multi-Generator NL2SQL Framework

1. System Architecture and Key Components

2. M-Schema: Semi-Structured Schema Representation

3. Multi-Generator Candidate Construction

3.1 Supervised Fine-Tuned (SFT) Generators

3.2 In-Context Learning (ICL) Generator

3.3 Refiner Module

4. Candidate Reorganization and Selection

5. Empirical Performance and Benchmarking

6. Error Analysis and Observed Strengths

7. Open Problems and Prospective Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

XiYan-SQL: Multi-Generator NL2SQL Framework

1. System Architecture and Key Components

2. M-Schema: Semi-Structured Schema Representation

3. Multi-Generator Candidate Construction

3.1 Supervised Fine-Tuned (SFT) Generators

3.2 In-Context Learning (ICL) Generator

3.3 Refiner Module

4. Candidate Reorganization and Selection

5. Empirical Performance and Benchmarking

6. Error Analysis and Observed Strengths

7. Open Problems and Prospective Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research