CRED-SQL: Robust Text-to-SQL Framework
- CRED-SQL is a robust framework that integrates cluster-based schema retrieval (CLSR) and a novel Execution Description Language (EDL) to enhance Text-to-SQL parsing.
- It employs hybrid column clustering and LLM-driven sub-schema selection, achieving up to 73.4% execution accuracy on large-scale, schema-rich datasets.
- The two-stage parsing pipeline breaks down compositional queries into stepwise planning and deterministic SQL translation, addressing schema mismatches and semantic deviations.
CRED-SQL is a state-of-the-art framework for Robust Text-to-SQL Parsing in large-scale, real-world relational databases, unifying Cluster-based Large-scale Schema Retrieval (CLSR) with a novel, explicit intermediate representation—Execution Description Language (EDL)—to address the fundamental challenges of schema mismatch and semantic deviation in neural SQL generation. Integrating hybrid column clustering, LLM-driven sub-schema selection, and a two-step parsing pipeline, CRED-SQL establishes new execution accuracy benchmarks on large cross-domain datasets, substantially improving semantic alignment between natural language and SQL logic by decomposing compositional query synthesis into explicit, stepwise planning followed by deterministic translation (Duan et al., 18 Aug 2025).
1. Underlying Challenges in Large-Scale Text-to-SQL
CRED-SQL is designed to resolve persistent semantic mismatch phenomena that degrade neural Text-to-SQL performance in settings with hundreds of relations and thousands of attributes, a domain where prior retrieval and parsing approaches (e.g., CRUSH [4SQL], DIN-SQL, MAC-SQL) fail to discriminate between contextually relevant and semantically proximate schema elements.
Two core challenges are defined:
- Schema mismatch: Lexically or semantically similar table and column names (e.g., “city”, “city_record”, “county.city”) often confound both dense-retrieval and prompt-based selection, causing relevant entities to be excluded from top-k candidate sets.
- Semantic deviation: Direct generation of SQL from natural language questions (NLQs) leads to unreliable mapping of intent, especially in the presence of complex aggregation, negation, and join patterns.
CRED-SQL addresses these by introducing:
- Cluster-based approaches for schema narrowing, and
- an intermediate, natural-language execution plan—EDL—that decouples semantic understanding from SQL surface form.
2. Cluster-based Schema Retrieval (CLSR)
CRED-SQL’s CLSR module performs hierarchical schema narrowing through column clustering, relevance-weighted scoring, and LLM-driven selection.
- Column Clustering: Every column is embedded () and grouped using a hybrid BM25+clustering algorithm. Each cluster contains columns corresponding to centroid ,
Clusters with high cardinality signal low discriminability, guiding down-weighting of ubiquitous columns.
- Table and Column Relevance Scoring: Each candidate table is scored by where is initial retrieval score, is column-query embedding similarity, and penalizes large clusters.
- LLM-driven Sub-schema Selection: The top-N tables and columns are presented to an LLM prompt engineered to return a minimal sufficient sub-schema 0 for the query.
This clustered retrieval both minimizes semantic interference and enhances Recall@1. On SpiderUnion, CLSR achieves Recall@3 of 77.1% vs. 30.6% for CRUSH (Duan et al., 18 Aug 2025).
3. Execution Description Language (EDL)
Execution Description Language (EDL) is a tailor-made, tree-structured intermediate representation for explicit, stepwise query plan expression in natural language, constrained by a fixed operator set.
Formalism
EDL is denoted in BNF: 2
Each operator is expanded as a short, canonical NL instruction; for example, ScanTable: “Retrieve all rows from the [TableName] table aliased as [Alias].”
Properties and Rationale
- Near-deterministic mapping: Each EDL operator directly maps to a SQL fragment (FROM, JOIN, WHERE, GROUP BY, etc.).
- Stepwise semantic decomposition: Deconstructs complex NLQ→SQL generation into compositional planning, reducing syntactic drift.
- Abstracts surface syntax: Removes the burden of SQL-specific tokens from LLMs and allows fine control over translation.
EDL to SQL conversion can be performed by a deterministic parser, illustrated as follows: 0
4. Two-Stage Parsing Pipeline
CRED-SQL instantiates a two-stage query synthesis approach leveraged end-to-end by LLMs:
Stage 1: NLQ → EDL
- Input: Selected sub-schema 3, NLQ 4, and in-context few-shot examples.
- Model: Qwen2.5-Coder-32B fine-tuned with LoRA (5 epochs, LR6).
- Objective: Minimize cross-entropy,
7
- Decoding: Enforces stepwise output (“#n.” for each EDL operator).
Stage 2: EDL → SQL
- Input: The induced EDL sequence/tree.
- Model: The same or smaller LLM, or optional deterministic parser.
- Objective: 8 where 9 is the canonical SQL.
- Alternatives: Rule-based parser for deterministic translation.
This architecture enables high execution fidelity and controllability, with >98% EDL → SQL execution accuracy reported on SpiderUnion (Duan et al., 18 Aug 2025).
5. Empirical Evaluation and Comparative Analysis
CRED-SQL’s evaluation spans large, schema-rich datasets:
| Dataset | #Tables | #Columns | #Dev Questions | Primary Metrics |
|---|---|---|---|---|
| SpiderUnion | 876 | 4,502 | 1,034 | EX, Recall@k |
| BirdUnion | 75 | 798 | 1,534 | EX |
- Baselines: CRUSH, DIN-SQL, MAC-SQL, DAIL-SQL (all with GPT-4o/Qwen2.5), NLQ→QPL→SQL.
- Execution Accuracy (EX) on SpiderUnion (dev):
- CRUSH+DIN-SQL (GPT-4o): 47.5%
- CRUSH+MAC-SQL (GPT-4o): 53.9%
- CRUSH+DAIL-SQL (GPT-4o): 50.2%
- CRED-SQL (GPT-4o): 69.1%
- CRED-SQL (Qwen2.5-Coder-32B): 73.4%
- On BirdUnion, CRED-SQL + MAC-SQL with Qwen2.5 gives 62.9% EX vs. 49.6% for CRUSH+MAC-SQL.
Ablation studies indicate:
- Removing CLSR drops EX by 23.2 percentage points (from 73.4% to 50.2% on SpiderUnion).
- Removing EDL costs 0.9 percentage points.
6. Limitations and Prospects
Several limitations are acknowledged:
- Latency: Two-stage parsing introduces ~3× per-NLQ increase in wall-clock time compared to direct parsing.
- Manual EDL Dataset Construction: EDL datasets (Spider-EDL, Bird-EDL) are human-verified; lighter annotation is needed for scalability.
- Schema Selection Tuning: Selection is based on in-context reasoning; further gains may be achieved by direct LLM fine-tuning.
- Complex Query Patterns: Nested aggregates and window functions present residual failure cases, suggesting a need for extended EDL operators or enhanced type/constraint checking.
This suggests that while CRED-SQL achieves significant advances, optimizing EDL authoring and sub-schema selection for both automation and broader query class coverage remains an open avenue.
7. Relationship to Causal Inference Frameworks and SQL-based Analytical Pipelines
The CRED-SQL design is compatible with recent work on expressing causal inference and data analytic pipelines in SQL, such as ZaliQL (Salimi et al., 2016). Both frameworks advocate performing analytical reasoning (whether for causal effect estimation or NLQ→SQL synthesis) within the database engine itself. ZaliQL’s “views and design patterns” for scalable, robust analytical translation can be adopted within CRED-SQL workflows, enabling compositional query logic—whether for causal estimation or general SQL synthesis—to benefit from deterministic, transparent, and fully traceable intermediate representations such as EDL.