AC-SQL: Actor-Critic Text-to-SQL

Updated 28 December 2025

AC-SQL is a text-to-SQL method that leverages actor-critic procedural boosting and chain-of-thought in-context learning to improve SQL generation.
It employs a dual-critic system that validates candidate queries through SQL execution checks and LLM-based semantic verification.
Empirical studies demonstrate significant accuracy gains across diverse LLMs and datasets, supported by theoretical performance guarantees.

AC-SQL, or Actor-Critic SQL, encompasses two families of methods for enhancing text-to-SQL tasks with LLMs: actor-critic procedural boosting and chain-of-thought-driven in-context learning with schema linking. These methods address the challenge of translating natural-language questions and database schemas into correct SQL, either by iteratively validating and refining LLM outputs (actor-critic), or by embedding structured reasoning traces in prompt exemplars (auto-CoT) to improve model reliability and SQL accuracy (Zheng et al., 2024, Zhang et al., 2023).

1. Problem Definition and Mathematical Framework

Text-to-SQL aims to map a question $x$ and schema $S$ (or $D$ ) to a SQL query $y$ (or $Y$ ) such that its execution returns the correct result. The base LLM parameterized by $\theta$ models the conditional probability:

$P_{\pi_\theta}(Y | x, S) = \prod_{i=1}^{|Y|} \pi_\theta(y_i \mid x, S, y_{<i})$

where generation is treated as a stochastic policy. The objective is to maximize expected reward, with $R(y) \in \{0, 1\}$ denoting the semantic correctness of the candidate query:

$J(\theta) = \mathbb{E}_{y \sim \pi_\theta(\cdot | x, S)} [R(y)]$

In actor-critic approaches, the "Critic" $V_\phi(x, S, y)$ estimates the expected reward for each candidate $y$ (Zheng et al., 2024). Reward is computed as execution match: $R(y)=1$ if the execution result matches the ground-truth SQL, $0$ otherwise.

2. Actor–Critic Iterative Algorithm

The actor-critic variant known as AC-SQL wraps any LLM-based actor with a lightweight, theoretically principled loop:

The Actor samples a candidate SQL $y_t$ for input $(x, S)$ .
The Critic, combining a live SQL execution engine (for syntax/engine errors) and an LLM-based semantic verifier (binary True/False), accepts $y_t$ if both checks pass:

$C(x, S, y_t) = \text{ExecutionCritic}(y_t) \wedge \text{LLMCritic}(y_t)$

If $C(x, S, y_t) = 1$ , return $y_t$ . Otherwise, repeat for up to $z$ rounds.

Crucially, this allows derivation of theoretical performance bounds. Let $p$ be the probability that the Actor generates a correct SQL, $q$ the Critic's false-negative rate, and $s$ the Critic's false-positive rate. The probability that the final output is correct after $z$ rounds is:

$\mathrm{prob} = p(1-s)\frac{1-[ps + (1-p)(1-q)]^{z-1}}{1 - [ps + (1-p)(1-q)]} + p[ps + (1-p)(1-q)]^{z-1}$

As $z \to \infty$ and $ps + (1-p)(1-q) < 1$ ,

$\mathrm{prob}(\infty) = \frac{p(1-s)}{p + q - pq - ps}$

This formal guarantee holds under mild assumptions and confirms iterative refinement strictly increases the chance of correct execution when $s+q < 1$ (Zheng et al., 2024).

3. In-Context Learning via Auto-CoT Schema Linking

A complementary approach, denominated ACT-SQL or AC-SQL (auto-CoT), enhances in-context learning (ICL) by generating chain-of-thought exemplars that link question fragments to schema elements (Zhang et al., 2023):

For each training triple $(D, Q, S)$ $(D, Q, S)$ , the algorithm infers and annotates:
- Link tables/columns in $S$ to question fragments via embedding-based similarity.
- Identify used literal values.
- 3. Final SQL answer.

Prompt construction combines static exemplars (random from the pool) and dynamic exemplars (selected as most question-similar under embedding similarity). The single LLM prompt consists of a database schema description, several annotated exemplars, and the current test question to be answered in the same style.

$(i^*, j^*) = \arg\max_{1 \leq i < j \leq |q|,\, j-i \leq L} \mathrm{Sim}(\text{tab.col}, q_{i:j})$

is used for optimal alignment between schema items and question spans.

This method issues exactly one API call per query, contrasting with chain-decomposed approaches that require multiple model invocations per sample, reducing both cost and latency.

4. Empirical Results and Benchmarks

Comprehensive experiments were conducted on the Spider dev, Spider-DK (domain-knowledge-intensive), and Spider-SYN (schema-perturbed) datasets. Zero-shot and few-shot settings were evaluated across commercial and open-source LLMs, including LLaMA2/3, GPT-3.5, GPT-4o, Vicuna, Guanaco, and Gemma models.

Key findings (Zheng et al., 2024, Zhang et al., 2023):

Actor–Critic (AC-SQL):
- On Spider-dev: GPT-4o baseline execution accuracy (EX): 72.1%; with AC-SQL: 77.7% (+5.6pp).
- LLaMA3-8B: 32.6% $\rightarrow$ 67.7% (+35.1pp).
- Gains extended to all 11 tested models (5–35pp absolute).
- Convergence of iterative boosts saturates by $z \approx 10$ rounds in practice.
Auto-CoT (ACT-SQL):
- Best few-shot (GPT-3.5-turbo): EX = 80.4%.
- On robustness splits (Spider-Syn, Spider-DK): EX improved from baseline values to 67.9–68.2%.
- In multi-turn tasks (SParC, CoSQL), competitive with prior zero-shot dialogue parsers.
Critic ablation: both execution and LLM critics contribute to accuracy. Combined critics consistently outperform either alone, except in particularly weak models where the LLM critic may degrade performance, but the s+q≤1 bound ensures safety.

Theoretical predictions closely matched empirical outcomes (e.g., for Vicuna-33B: theoretical prob ≈ 0.6334, observed EX ≈ 0.610) (Zheng et al., 2024).

5. Reward, Evaluation, and Critic Design

Reward is computed as execution accuracy: $R(y) = 1$ iff the generated SQL produces a result set equivalent to the annotated solution when executed. Critic evaluation involves two stages:

Execution Critic: Binary validation based on SQL engine parsing and execution.
LLM Critic: The same or a different LLM, prompted to verify semantic intent alignment ("Answer True if the SQL candidate $y$ correctly expresses the intent of $x$ under schema $S$ ; otherwise answer False").

Only candidates passing both checks are accepted. Actor updates may optionally use policy-gradient losses with or without a learned baseline. Critic training employs regression loss to match $V_\phi(x, S, y)$ to observed rewards.

6. Limitations and Extensions

Identified limitations include:

Critic feedback is restricted to binary True/False, limiting the exploitation of richer LLM capabilities (e.g., error localization, multi-hop feedback), which would break the simplicity and analytical tractability.
Inference speed is reduced by repeated LLM calls in the actor-critic loop; optimizations such as caching or lightweight critics are proposed for future work.
In auto-CoT, multi-turn prompt rewriting is susceptible to errors, especially in resolving co-references, and schema-linking can underperform in cases of synonym mismatches (Zhang et al., 2023).

Notable extensions under investigation:

Specialization of critic models independently of the actor to enhance verification.
Generalization of the error-bound theory to hierarchical or multi-step critics providing structured feedback.
Combination with few-shot schema-construction and hyperparameter meta-learning for improved coverage and robustness.

7. Significance and Future Directions

AC-SQL methods demonstrate that both iterative actor-critic correction and schema-linked chain-of-thought prompting can consistently improve zero-shot and in-context text-to-SQL performance for a wide range of LLMs, without requiring model fine-tuning (Zheng et al., 2024, Zhang et al., 2023). These approaches establish performance guarantees, empirically and theoretically, and substantially reduce failure rates without loss of generality.

Potential future work includes integrating richer feedback mechanisms, efficient critic instantiation, hyperparameter optimization for in-context learning, and single-pass joint rewriting of multi-turn queries. The methodologies are not limited to SQL generation and could plausibly extend to other code synthesis, formal reasoning, or semantic parsing domains where precise, verifiable output is required.

Markdown Report Issue Upgrade to Chat

References (2)

An Actor-Critic Approach to Boosting Text-to-SQL Large Language Model (2024)

ACT-SQL: In-Context Learning for Text-to-SQL with Automatically-Generated Chain-of-Thought (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AC-SQL Method.

AC-SQL: Actor-Critic Text-to-SQL

1. Problem Definition and Mathematical Framework

2. Actor–Critic Iterative Algorithm

3. In-Context Learning via Auto-CoT Schema Linking

4. Empirical Results and Benchmarks

5. Reward, Evaluation, and Critic Design

6. Limitations and Extensions

7. Significance and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

AC-SQL: Actor-Critic Text-to-SQL

1. Problem Definition and Mathematical Framework

2. Actor–Critic Iterative Algorithm

3. In-Context Learning via Auto-CoT Schema Linking

4. Empirical Results and Benchmarks

5. Reward, Evaluation, and Critic Design

6. Limitations and Extensions

7. Significance and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research