Papers
Topics
Authors
Recent
Search
2000 character limit reached

Execution Description Language (EDL)

Updated 29 December 2025
  • EDL is a natural-language–style, stepwise intermediate representation designed to bridge NL queries and SQL by using explicit, numbered execution operators.
  • It decomposes query planning into two stages—NLQ-to-EDL and EDL-to-SQL—thereby reducing semantic drift and improving compositional accuracy.
  • EDL’s explicit operator mapping and tree structure enable practical query transformations and measurable performance gains in large-scale, cross-domain databases.

Execution Description Language (EDL) is a natural-language–style, stepwise intermediate representation for database query generation, specifically designed to mediate between a user's natural-language question (NLQ) and a corresponding SQL query. EDL is structured as an explicit, numbered list of execution operators, each mapping directly to a classical relational database operation. This approach systematizes query planning in neural text-to-SQL systems by decomposing semantic parsing into two discrete stages: NLQ-to-EDL and EDL-to-SQL. EDL has been introduced and formalized in the CRED-SQL framework to reduce semantic drift and improve compositional accuracy, especially in large-scale, cross-domain databases (Duan et al., 18 Aug 2025).

1. Formal Definition and Syntax

EDL adopts a formal, process-oriented syntax organized as a sequence of operator invocations, each associated with a particular execution step. The top-level grammar is provided in Backus–Naur Form (BNF) as follows:

$\begin{aligned} \langle \text{EDLDoc}\rangle\; &::=\; \langle \text{StepList}\rangle \ \langle \text{StepList}\rangle\; &::=\; \langle \text{Step}\rangle\,(\texttt{\n}\;\langle \text{Step}\rangle)^* \ \langle \text{Step}\rangle\; &::=\; \texttt{\#}\,\langle \text{StepNum}\rangle\,\texttt{.}\,\langle \text{OpInvocation}\rangle \ \langle \text{StepNum}\rangle\; &::=\; [\texttt{1}\!-\!\texttt{9}][\texttt{0}\!-\!\texttt{9}]* \ \langle \text{OpInvocation}\rangle\; &::=\; \langle \text{Operator}\rangle\;\langle \text{ArgList}\rangle \ \langle \text{Operator}\rangle &::=\; \texttt{ScanTable}\mid \texttt{Join}\mid \texttt{ReserveRows}\mid \texttt{GroupBy}\mid \texttt{HavingClause}\mid \texttt{Sort}\mid \texttt{Limit}\mid \texttt{SelectColumn}\mid \ &\quad\;\;\texttt{Subquery}\mid \texttt{SetOp}\mid \texttt{ArithmeticCalc}\mid \dots \ \langle \text{ArgList}\rangle &::=\; \text{free-form English describing table names, aliases, columns, conditions, etc.} \end{aligned}$

EDL plans are explicit trees, with leaf nodes typically instantiating ScanTable or Subquery, and internal nodes performing operations such as Join, ReserveRows (filter), GroupBy, or arithmetic calculation. Step referentiality enables chaining of prior computation by explicit step number.

A summary of core operators appears below.

Operator Function Example Usage
ScanTable Retrieve table rows Retrieve all rows from [city] as T1
Join Join two tables Join [Pets] as P on HP.[PetID] = P.[PetID]
ReserveRows Filter rows From #1, keep rows where ...
GroupBy Aggregation grouping Group #4 by [District]
HavingClause Aggregate-level filtering HAVING count(*) > 5
Sort Sort rows Sort by [Population], descending
Limit Row count truncation Limit to 10 rows
SelectColumn Project columns Select [major], [age] from #7
Subquery Nested result Retrieve all cities with ... as T2
SetOp Set operation Intersect #3 and #4
ArithmeticCalc Computed columns Compute avg_population as ...
DateCalculation Temporal computation Extract year from [BirthDate]
Cast Type conversion Cast [Amount] as float
Ranking Row ranking Rank students by GPA
SubstringExtraction Substring from column Extract prefix from [Name]
CaseStatement Conditional value selection CASE WHEN ... END

2. Exemplification of EDL for Query Planning

EDL plans are constructed for both simple and complex NLQs. Two illustrative transformations are given:

Example 1: Single-table Aggregation

  • NLQ: "Find the number of cities in each district whose population is greater than the average population of cities?"
  • EDL:
  1. Scan Table: Retrieve all rows from the [city] table as T1.
  2. Subquery: Retrieve all rows from the [city] table as T2.
  3. Arithmetic Calculation: Compute avg_population as the average of T2.[Population].
  4. Reserve Rows: From #1, keep rows where T1.[Population] > #3.avg_population.
  5. Group By: Group #4 by the [District] column.
  6. Select Column: Select count(*) as city_count from #5.

Example 2: Multi-table Join with Negation

  • NLQ: "Find the major and age of students who do not have a cat pet."
  • EDL:
  1. Scan Table: Retrieve all rows from the [Student] table as S.
  2. Scan Table: Retrieve all rows from the [Has_Pet] table as HP.
  3. Join: Join [Pets] as P on HP.[PetID] = P.[PetID].
  4. Reserve Rows: From #3, keep rows where P.[PetType] = 'cat'.
  5. Select Column: Select HP.[StuID] from #4.
  6. Reserve Rows: From #1, keep rows whose S.[StuID] is not in #5.
  7. Select Column: From #6 select [major], [age].

These examples highlight EDL's explicit operator chaining and transparent tracking of dataflow and selection logic (Duan et al., 18 Aug 2025).

3. Mapping from NLQ to EDL: Model and Training

The NLQ-to-EDL process is cast as a supervised sequence generation task. Spider and Bird SQL annotations are automatically converted to EDL via GPT-4o, with database execution ensuring semantic alignment. Datasets of ⟨NLQ, gold-EDL⟩ pairs are constructed (Spider-EDL and Bird-EDL).

  • LLM Base: Qwen2.5-Coder-32B (open-source, code-specialized LLM)
  • Finetuning: LoRA, rank-8, learning rate 5×1055 \times 10^{-5}, two epochs.
  • Prompt Template (Inference):
  1. Task cue: "Translate the following question into an EDL plan."
  2. Schema context (tables and columns).
  3. Three to five few-shot examples (NLQ→EDL).
  4. The query to parse.

The learning objective is next-token cross-entropy loss: LT2E(θ)=i=1Nlogpθ(eiqi,di)\mathcal{L}_{\text{T2E}}(\theta) = -\sum_{i=1}^N \log p_\theta(e_i \mid q_i, d_i) with qiq_i the NLQ, did_i the schema context, and eie_i the gold EDL. Inference uses autoregressive decoding with top-1 selection; beam search is optional.

A final consistency check (EDL→SQL→DB) for non-empty results can be used, but is rarely necessary in practice (Duan et al., 18 Aug 2025).

4. EDL-to-SQL Mapping: Deterministic and Model-Based Approaches

EDL-to-SQL conversion is implemented via two main strategies:

  • (a) Structure-to-sequence LLMs: The EDL (numbered steps) and schema are serialized as input; the output is SQL. The model is trained with a standard cross-entropy objective on ⟨EDL, SQL⟩ pairs.
  • (b) Deterministic template-based code: Each operator is mapped to its SQL clause via explicit pseudocode logic:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
Initialize:
    SELECT_list  []
    FROM_clause  ""
    JOIN_clauses  []
    WHERE_clause  ""
    GROUP_BY  []
    HAVING_clause  ""
    ORDER_BY  ""
    LIMIT  None

For each step in EDL in ascending order:
    op, args  parse(step)
    match op:
        case ScanTable(table T as alias A):
            FROM_clause  f"FROM {T} AS {A}"
        case Join(table T as alias A on condition C):
            JOIN_clauses.append(f"JOIN {T} AS {A} ON {C}")
        case ReserveRows(condition C):
            if GROUP_BY not set:
                WHERE_clause  f"WHERE {C}"
            else:
                HAVING_clause  f"HAVING {C}"
        case GroupBy(columns cols):
            GROUP_BY  cols
        case SelectColumn(col c):
            SELECT_list.append(c)
        case ArithmeticCalc(newcol nc = expr):
            SELECT_list.append(f"{expr} AS {nc}")
        case Sort(column c, order dir):
            ORDER_BY  f"ORDER BY {c} {dir}"
        case Limit(n):
            LIMIT  n
        case SetOp(type S, q1, q2):
            SQL  f"({q1}) {S.upper()} ({q2})"

At end:
SQL  (
    SELECT {", ".join(SELECT_list)}
    {FROM_clause}
    {" ".join(JOIN_clauses)}
    {WHERE_clause if any}
    {GROUP_BY ? "GROUP BY " + ",".join(GROUP_BY) : ""}
    {HAVING_clause if any}
    {ORDER_BY if any}
    {f"LIMIT {LIMIT}" if LIMIT else ""}
)

Faithfulness is obtained because the operator mapping is explicit and 1-to-1. Empirical results show >98% execution accuracy across LLMs when gold EDL is mapped to SQL (Duan et al., 18 Aug 2025).

5. Training Objectives and Evaluation Metrics

Supervised training is performed for both NLQ-to-EDL and EDL-to-SQL mappings, each using token-wise cross-entropy losses:

minθ  i=1N  LCE(pθ(e~iqi,di),ei),minϕ  i=1N  LCE(pϕ(s~ie~i,di),si)\min_\theta\; \sum_{i=1}^{N}\;\mathcal{L}_{\mathrm{CE}}\left(p_\theta(\widetilde e_i\mid q_i,d_i),\,e_i\right) \,,\quad \min_\phi\; \sum_{i=1}^{N}\;\mathcal{L}_{\mathrm{CE}}\left(p_\phi(\widetilde s_i\mid \widetilde e_i,d_i),\,s_i\right)

Evaluation uses two principal metrics:

  • Execution Accuracy (EX): Fraction of test queries for which the predicted execution matches the gold execution.

EX=1Mi=1M1(exec(s~i)=exec(si))\mathrm{EX} = \frac{1}{M}\sum_{i=1}^M \mathbf{1}\bigl(\mathrm{exec}(\widetilde s_i) = \mathrm{exec}(s_i)\bigr)

  • Schema-retrieval Recall@k: Proportion of gold tables covered by the top-k retrieved tables.

Recall@k={gold tables}{top-k retrieved}{gold tables}\mathrm{Recall}@k = \frac{|\{\text{gold tables}\}\cap \{\text{top-}k\text{ retrieved}\}|}{|\{\text{gold tables}\}|}

These two metrics rigorously quantify both upstream schema selection effectiveness and end-to-end semantic fidelity (Duan et al., 18 Aug 2025).

6. Empirical Performance and Impact

The integration of EDL into CRED-SQL yields measurable gains over prior art, especially in large, complex schemas.

  • On SpiderUnion (GPT-4o), CLSR schema retrieval achieves recall@1 of 40.23% (vs. 8.51% for CRUSH) and recall@3 of 77.07% (vs. 30.56%).
  • End-to-end (Qwen2.5-Coder-32B): CRUSH+NLQ→SQL achieves 51.5% EX; CRED-SQL with EDL achieves 73.4% EX (+21.9 points).
  • Intermediate representation comparison (Spider, GPT-4o, DIN-SQL): NLQ→SQL 78.1% EX; NLQ→EDL→SQL 83.3% EX (+5.2 points).
  • On BirdUnion (MAC-SQL backbone, GPT-4o): CRED-SQL+EDL yields 58.28% EX (vs. 51.17% baseline, +7.11 points).

A full breakdown across upstream and open/closed LLMs shows that EDL consistently improves both schema selection and execution accuracy over direct SQL-generation or previous intermediate representations such as QPL. The adoption of EDL, especially when paired with strong schema retrieval as in CRED-SQL, substantially reduces semantic drift and ensures more faithful mappings in neural semantic parsing pipelines (Duan et al., 18 Aug 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Execution Description Language (EDL).