Papers
Topics
Authors
Recent
Search
2000 character limit reached

XiYan-SQL: Multi-Generator NL2SQL Framework

Updated 17 March 2026
  • The paper introduces XiYan-SQL, an end-to-end Text-to-SQL framework that uses a multi-generator ensemble to enhance SQL generation.
  • It employs schema filtering, diverse supervised fine-tuned and in-context SQL generators, and a self-refinement module to optimize query construction.
  • Empirical results demonstrate state-of-the-art performance on benchmarks like BIRD and Spider, showcasing improved execution accuracy and robustness.

XiYan-SQL is an end-to-end framework for Text-to-SQL (NL2SQL) translation that establishes a new paradigm via a multi-generator ensemble methodology. It pioneers the integrated use of schema filtering, multiple supervised-fine-tuned and in-context SQL generators, self-refinement modules, and a learned selection model. XiYan-SQL achieves state-of-the-art execution accuracy across competitive NL2SQL benchmarks, including 75.63% on BIRD and 89.65% on Spider, surpassing prior methods and demonstrating strong generalization (Gao et al., 2024, Liu et al., 7 Jul 2025).

1. System Architecture and Key Components

XiYan-SQL decomposes the NL2SQL process into distinct, specialized components, each responsible for addressing tractable subproblems in robust SQL semantic parsing.

  • Schema Filter: Filters the input database schema to derive multiple relevant schema subsets for each user query, reducing noise and computational overhead. Filtering uses LLM-based keyword extraction followed by embedding similarity to match columns/tables and iterative selection to balance precision and recall, resulting in a set {S1,,Sps}\{S_1,\dots,S_{p_s}\} of filtered schemas.
  • Multi-Generator Ensemble: For each filtered schema, an ensemble of pmp_m SQL generators (Supervised Fine-Tuned and ICL-based) produces a diverse pool of candidate queries. Each generator is fine-tuned or prompted on different auxiliary tasks and SQL formats to encourage diversity as well as precision.
  • Self-Refiner Module: Executes each candidate SQL. In case of failure, the generator is re-prompted (or the candidate is re-generated) based on the exception feedback, correcting logical or syntactic errors such as missing joins or mismatched types.
  • Selection Model with Candidate Reorganization: SQL candidates are grouped by identical execution outputs. A candidate reorganization strategy (cluster by majority, then order by generator reliability and brevity) feeds into a lightweight fine-tuned LLM that ranks and selects the optimal candidate for final execution.

The model pipeline is depicted in the table below.

Component Function Techniques
Schema Filter Relevant sub-schema extraction LLM keyword extraction, embedding similarity
SQL Generators Diverse candidate SQL generation Multi-task SFT, ICL prompting, format variation
Refiner Error correction via execution feedback Self-refinement using exception traces
Selector Final candidate ranking Fine-tuned LLM, candidate reorganization

2. M-Schema: Semi-Structured Schema Representation

To enhance model awareness of intricate database structures, XiYan-SQL introduces the M-Schema representation.

  • Definition: For a database SS with tables T={T1,,Tn}T = \{T_1,\dots,T_n\}, columns C(Ti)={ci1,,cim}C(T_i) = \{c_{i1},\dots,c_{im}\}, and foreign key relations FKFK, each column cijc_{ij} is a 5-tuple (name,dtype,desc,pk,examples)(\mathit{name}, \mathit{dtype}, \mathit{desc}, \mathit{pk}, \mathit{examples}). M-Schema represents SS as a flat sequence enumerating tables, their columns, and FK relations:

M-Schema=[DB_ID, D,#T1,{columns},Foreign Keys,FK1,,#Tn,{},Foreign Keys,FKn]\text{M-Schema} = [\,\langle \text{DB\_ID, D} \rangle,\, \#T_1,\, \{\mathrm{columns}\}, \langle \text{Foreign Keys}, FK_1\rangle,\, \ldots,\, \#T_n,\, \{\}, \langle \text{Foreign Keys}, FK_n\rangle\,]

  • Formalization:

cij=(nameij,dtypeij,descij,pkij,examplesij)c_{ij} = (\mathit{name}_{ij}, \mathit{dtype}_{ij}, \mathit{desc}_{ij}, \mathit{pk}_{ij}, \mathit{examples}_{ij})

FK={(cij,ckl)cij references ckl}FK = \{ (c_{ij},c_{kl}) \mid c_{ij}\ \text{references}\ c_{kl} \}

  • Motivation: M-Schema enables both SFT and ICL-based models to access fine-grained schema context and relationships, empirically improving execution accuracy by up to +2.03% absolute over DDL or MAC-SQL schema representations (Gao et al., 2024).

3. Multi-Generator Candidate Construction

XiYan-SQL leverages an ensemble of SQL generators to optimize both quality and diversity in candidate space.

3.1 Supervised Fine-Tuned (SFT) Generators

Each SFT model undergoes a two-stage training pipeline:

  • Basic-Syntax Stage: Induces SQL syntax fluency using large dialect-agnostic corpora.
  • Generation-Enhance Stage: Multi-task training on:
    • Question \rightarrow SQL,
    • SQL \rightarrow Question reconstruction,
    • Evidence selection,
    • SQL discrimination/regeneration with execution feedback.

The objective for each task tt (Dt={(x,y)}D_t = \{(x, y)\}) is:

Lt(θ)=(x,y)Dtk=1ylogpθ(ykx,y<k)\mathcal{L}_{t}(\theta) = -\sum_{(x,y)\in D_t} \sum_{k=1}^{|y|} \log p_{\theta}(y_k|x, y_{<k})

Overall model loss is L(θ)=tλtLt(θ)\mathcal{L}(\theta) = \sum_t \lambda_t \mathcal{L}_t(\theta). Multi-format enhancement is achieved by training on diversified SQL rewrites (chunked, standardized, mixed) to create models with distinct output styles.

3.2 In-Context Learning (ICL) Generator

The ICL-based generator (e.g., GPT-4o) selects exemplars using Named-Entity Masked Skeleton Similarity:

  • Named entities in the query are masked, embeddings are computed, and K-nearest exemplars are selected via cosine similarity.
  • If schema-linking yields 2\geq 2 tables, only exemplars with multi-table joins are considered.

3.3 Refiner Module

For each candidate ss, if initial execution yields an error, the refiner LLM is prompted with {schema,q,s,E}\{\text{schema}, q, s, E\} to repair the SQL. This self-refinement loop typically corrects execution-critical syntactic/logic errors.

4. Candidate Reorganization and Selection

Majority voting is suboptimal when correct solutions are "minority" in the candidate set. XiYan-SQL's selection process includes:

  • Candidate Reorganization: Candidates are clustered by execution result; clusters are sorted by size (consensus) and generator reliability, with within-cluster ordering favoring brevity or reliability. If no consensus majority, the shortest candidate from each cluster is prioritized.
  • Learned Selection Model: A lightweight LLM is fine-tuned to select among the reorganized candidate list LL':

[QuestionSchemaunionEvidenceCandidatesL][\mathtt{Question}\,\Vert\,\mathtt{Schema}_{\mathrm{union}}\,\Vert\,\mathtt{Evidence}\,\Vert\,\mathtt{Candidates}\,L']

The model predicts the index ii^*, yielding the final SQL ll^*. The training objective is the standard cross-entropy on gold target selection.

Ablations confirm this module contributes 3–4% EX performance over naive majority-voting(Liu et al., 7 Jul 2025).

5. Empirical Performance and Benchmarking

Extensive benchmarking on BIRD, Spider, SQL-Eval, and NL2GQL demonstrate the efficacy and generalizability of XiYan-SQL.

Benchmark XiYan-SQL EX (%) Prior SOTA Improvement
BIRD 75.63 74.79 +0.84 (vs. CHASE-SQL+Gemini)
Spider 89.65 89.60 +0.05 (vs. MCS-SQL+GPT-4)
SQL-Eval 69.86 67–68 +1-2 points
NL2GQL 41.20 <18 >23 points
  • XiYan-SQL further improves over closed-source LLMs by substantial margins in generalization/robustness tests on PostgreSQL and MySQL datasets (Liu et al., 7 Jul 2025).
  • Ablations demonstrate each module's necessity: removing multi-generator diversity, schema filter, or selection model drops EX by up to 4.04%, 1.24%, and 3.13%, respectively. The multi-generator oracle is significantly above achieved EX, indicating selection remains a bottleneck.

6. Error Analysis and Observed Strengths

  • Schema Filter reduces "no-table-found" errors by up to 30%.
  • Diverse Generators enable handling complex SQL constructs such as window functions and multi-way joins, capturing distributional tails.
  • Selection Model mitigates errors where high-confidence single model outputs are semantically incorrect (e.g., improper GROUP BY or filter conditions).
  • Refiner repairs minor but execution-critical SQL errors that would otherwise cause output elimination.

Collectively, these factors yield state-of-the-art and robust performance across a wide spectrum of database schema and query complexities.

7. Open Problems and Prospective Directions

XiYan-SQL demonstrates strong generalization and efficiency, yet limitations remain:

  • Scaling filter/generation stages to more schema subsets (ps>2p_s>2) and generator models (pm>5p_m>5) may further increase oracle upper bounds.
  • Incorporating richer execution feedback, such as query plan costs, could enhance the refiner's correction capabilities.
  • Integrating the multi-stage pipeline into a unified, multi-task, multi-format curriculum-trained "all-in-one" model is an avenue for future research (Gao et al., 2024, Liu et al., 7 Jul 2025).

XiYan-SQL exemplifies a modular, interpretable, and extensible architecture for complex semantic parsing, establishing a new reference framework for industrial NL2SQL deployments and further academic exploration.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to XiYan-SQL.