Papers
Topics
Authors
Recent
Search
2000 character limit reached

OraPlan-SQL: Bilingual NL2SQL Planning System

Updated 30 January 2026
  • OraPlan-SQL is a planning-centric NL2SQL system that decomposes natural language queries into explicit stepwise plans and SQL queries using a dual-agent architecture.
  • The framework integrates feedback-guided meta-prompting, bilingual entity-linking, and diversified planning with majority voting to enhance reliability and reduce execution errors.
  • OraPlan-SQL achieves notable performance with 54.96% execution accuracy in English, 56.67% in Chinese, and over 99% syntactic SQL validity.

OraPlan-SQL is a planning-centric NL2SQL system designed for the Archer NL2SQL Evaluation Challenge 2025, a benchmark focusing on bilingual (English/Chinese) natural language-to-SQL reasoning tasks that require arithmetic, commonsense, and hypothetical inference. The framework adopts a dual-agent architecture, integrating explicit stepwise planning, feedback-driven prompt refinement, entity-variant management for robust bilingual reasoning, diversified planning for reliability, and majority-vote-based answer selection. OraPlan-SQL surpasses prior state-of-the-art results, achieving 54.96% execution accuracy (EX) in English and 56.67% in Chinese, while ensuring over 99% syntactic SQL validity (VA) (Liu et al., 27 Oct 2025).

1. System Architecture and Pipeline

OraPlan-SQL decomposes the NL2SQL process into two distinct agents:

  • Planner agent (fplan(Q,D)Pf_\text{plan}(Q, D) \rightarrow P): Receives a natural-language query QQ and database schema DD, outputs an explicit, stepwise plan PP in natural language detailing reasoning steps including arithmetic operations, filtering, and hypothetical conditions.
  • SQL agent (fsql(P,D)Sf_\text{sql}(P, D) \rightarrow S): Converts the plan PP and schema DD into an executable SQL query SS, preserving the logical integrity of the plan.

The pipeline operates as follows:

  1. Accepts user queries in English or Chinese (QQ).
  2. Retrieves top-kk schema elements (k=5k=5) via an embedding-based retriever.
  3. The planner agent produces NN diverse plans ({P1,,PN}\{P_1, \dots, P_N\}) by sampling at temperature 0.7.
  4. Each PiP_i is independently processed by the SQL agent, using retrieved schema plus semantically similar in-context examples (ICL), yielding NN SQL outputs.
  5. Executes each SQL and collects results {Ri}\{R_i\}.
  6. Applies majority voting across {Ri}\{R_i\} to select the final answer, resolving ties using model log-probability.

This agentic framework isolates the complex reasoning from SQL syntax concerns and allows targeted improvements to the planning process (Liu et al., 27 Oct 2025).

2. Feedback-Guided Meta-Prompting

Rather than orchestrating multiple sub-agents, OraPlan-SQL applies a meta-prompting strategy to iteratively refine a single planner agent:

  • Failure-case collection: The base planner is run on a held-out development set; cases with incorrect execution or flawed plans are collated.
  • Clustering: Each failure (query and plan) is embedded via an LLM into a dense vector. KK-means clustering identifies distinct error types (KK typically 5–7).
  • Human annotation and LLM distillation: Clusters are manually labeled, and a "meta-writer" LLM is prompted with representative examples to produce concise corrective guidelines.
  • Prompt integration: The distilled guidelines are appended to the planner’s system prompt under "Planner Correction Guidelines".

At inference, the planner executes according to these guidelines. This feedback loop demonstrably enhances generalization and corrects systematic planning errors. Ablation reveals a 27.9-point drop in EX when meta-guidelines are removed (from 72.12% to 44.23%), underscoring their impact (Liu et al., 27 Oct 2025).

3. Bilingual Entity-Linking and Variants

Complex bilingual scenarios in NL2SQL surface transliteration and entity mismatch challenges, particularly when Chinese queries reference entities whose forms do not match English schema entries. OraPlan-SQL addresses this by:

  • Entity variant generation: The planner system prompt instructs explicit identification and enumeration of plausible English variants (translations, abbreviations, aliases) for each entity detected in the query. For example, 'æ ¼é‡Œå…¬å›­' yields ['Glebe Park', 'Glebe Gardens'].
  • Encoding in plan: A section titled "Entity Variants" is prepended to each plan, mapping E[E1,E2,]E \rightarrow [E_1, E_2, \dots].
  • Downstream usage: The SQL agent consults these variants for WHERE and JOIN clause construction.

This methodology improves cross-lingual entity resolution and reduces execution errors related to schema mismatches (Liu et al., 27 Oct 2025).

4. Plan Diversification and Majority Voting

To enhance reliability and reduce brittleness, OraPlan-SQL introduces plan diversification and consensus selection:

  • Diverse plan generation: Planner agent samples NN plans (N=5N=5 or $7$) at temperature 0.7 to cover distinct reasoning pathways.
  • SQL query synthesis: Each plan is processed with the SQL agent at temperature 0.0 for deterministic conversion.
  • Majority voting: After executing resulting SQLs (R1,,RNR_1, \dots, R_N), the final answer RR^* is chosen as the mode:

R=arg maxvi=1N1[Ri=v]R^* = \operatorname{arg\,max}_v \sum_{i=1}^N \mathbf{1}[R_i = v]

If no majority, the output with highest model log-likelihood is selected, defaulting to P1P_1's result in unresolved cases.

Ablation indicates that disabling majority voting leads to a modest decrease in EX (–1.0 point), demonstrating robust consensus mechanisms (Liu et al., 27 Oct 2025).

5. SQL Agent Engineering and In-Context Learning

SQL agent effectiveness depends on precise prompt construction:

  • Prompt components:
  1. Agent role: "You are a SQL-generation agent. You will convert the user’s plan into valid SQL."
  2. Top-5 retrieved schema elements.
  3. Three in-context examples (NL plan to gold SQL).
  4. Instruction to adhere strictly to plan step order.
  • Model selection: Both planner and SQL agent utilize GPT-5. The SQL agent decodes at temperature 0 to minimize hallucinations.

Schema retrieval and ICL demonstrably reduce syntactic and semantic SQL errors. Ablation shows that omitting ICL marginally decreases EX (–1.9 points), highlighting its refinement role (Liu et al., 27 Oct 2025).

6. Evaluation Metrics and Comparative Results

Performance on the Archer 2025 bilingual benchmark is evaluated as follows:

  • Execution Accuracy (EX\text{EX}):

EX=#correct executions#total queries×100%\text{EX} = \frac{\#\,\text{correct executions}}{\#\,\text{total queries}} \times 100\%

  • SQL Validity (VA\text{VA}):

VA=#syntactically valid SQLs#total queries×100%\text{VA} = \frac{\#\,\text{syntactically valid SQLs}}{\#\,\text{total queries}} \times 100\%

Leaderboard results:

System EN EX ZH EX VA
OraPlan-SQL (GPT-5) 54.96% 56.67% >99%
HIT–SCIR (GPT-4o) 48.66% 44.08%

OraPlan-SQL achieves gains of +6.3 points in English and +12.6 points in Chinese over the runner-up (Liu et al., 27 Oct 2025).

Ablation studies (English dev set, GPT-5):

Component Removed EX (%) ΔEX
Full system 72.12
w/o meta-prompt guidelines 44.23 –27.9
w/o planner agent 64.42 –7.7
w/o ICL in SQL agent 70.19 –1.9
w/o majority voting 71.15 –1.0

7. Supplementary Algorithms and Formalization

Algorithm 1: Feedback-Guided Meta-Prompting

Given held-out queries {Qi}\{Q_i\}, schema DD, and base planner fplan0f_\text{plan0}:

  1. For each QiQ_i: a. Pifplan0(Qi,D)P_i \leftarrow f_\text{plan0}(Q_i, D) b. Sifsql(Pi,D)S_i \leftarrow f_\text{sql}(P_i, D) c. If Execute(Si)\operatorname{Execute}(S_i) \neq gold answer, collect (Qi,Pi)(Q_i, P_i).
  2. Embed failures into xix_i via LLM embedding.
  3. Cluster {xi}\{x_i\} into KK clusters via k-means (cosine).
  4. For each c=1..Kc=1..K: a. Sample 5 examples, label error type. b. Prompt meta-writer LLM for guidelines. c. Receive guideline bullet-list GcG_c.
  5. GG^* \leftarrow concatenate all GcG_c under "Planner Correction Guidelines."
  6. Return GG^*.

Majority Voting Formalization:

Given results R1,,RNR_1, \dots, R_N, define:

votes(v)=i=1N1[Ri=v]\text{votes}(v) = \sum_{i=1}^N \mathbf{1}[R_i = v]

Final answer:

R=arg maxvvotes(v)R^* = \operatorname{arg\,max}_v \text{votes}(v)

This architecture, refinement methodology, bilingual entity management, plan diversification, and consensus selection together achieve state-of-the-art results on reasoning-intensive, bilingual NL2SQL benchmarks (Liu et al., 27 Oct 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to OraPlan-SQL.