HEROSQL: Hierarchical SQL Validation

Updated 4 January 2026

HEROSQL is a hierarchical methodology that integrates Logical Plan and AST to validate semantic alignment in Text-to-SQL queries, capturing both global intent and local details.
It employs an AST-driven augmentation strategy to generate syntactically valid yet semantically erroneous SQL samples, enhancing error detection.
Using a Nested Message Passing Neural Network, HEROSQL achieves notable performance improvements on benchmarks, boosting both AUPRC and AUROC metrics.

HEROSQL is a hierarchical representation methodology for semantic validation in Text-to-SQL systems, developed to address the limitations of prior approaches that focused predominantly on syntactic correctness without robust detection of semantic misalignments between natural language questions and their generated SQL statements. By bridging both global intent and local SQL details, HEROSQL enables reliable validation, improved interpretability, and more granular feedback mechanisms for data querying platforms (Qiu et al., 28 Dec 2025).

1. Hierarchical SQL Representation: Logical Plan and AST Integration

HEROSQL introduces a dual-level structure for SQL representation:

The Logical Plan (LP) $\mathsfit{LP} = (V_{\mathrm{LP}}, E_{\mathrm{LP}})$ is derived from a gold SQL query $s^+$ using an optimizer (e.g., Apache Calcite). Each node $v_i^{\mathrm{LP}} \in V_{\mathrm{LP}}$ contains an operator $o_i$ (such as Filter, Join, Aggregate) and a text attribute $a_i$ representing a sub-SQL fragment $s_i$ .
Abstract Syntax Trees (ASTs) $\mathcal{A}_i = (V_{A_i}, E_{A_i})$ are constructed for each sub-SQL fragment $s_i$ , where AST nodes encode atomic tokens or syntactic types $c_j^{A_i}$ , and edges capture parent-child nesting (e.g., binary comparisons, column references).

This hierarchical structure unifies coarse-grained intent and fine-grained syntactic details, allowing the semantic validator to detect subtle discrepancies.

2. AST-Driven Sub-SQL Augmentation Strategy

To optimize semantic validation, HEROSQL employs an AST-driven augmentation pipeline for negative sample generation:

Formalization: Each sub-SQL $a_i$ is parsed into an AST, e.g., $s_i =$ "age > 30 AND salary < 100000" forms an AST rooted with comparison operators ">", "<", and leaves corresponding to tokens ("age", "30", "salary", "100000").
Sample Generation Pseudocode:

Input: D_gold = { (q, s⁺) }, T = {T₁, ..., T_K}
Output: D_AST = { (q, s⁻) }
D_AST ← ∅
for each (q, s⁺) in D_gold:
    LP ← Optimize(s⁺)
    for each node v_i^{LP} in LP.V:
        A_i ← ParseAST(v_i^{LP}.attribute)
        for each transformation rule T_k in T:
            A_i' ← T_k(A_i)
            s_candidate ← ReconstructSQL(LP with A_i → A_i')
            if syntacticallyValid(s_candidate) and Exec(s_candidate) ≠ Exec(s⁺):
                D_AST ← D_AST ∪ { (q, s_candidate) }
return D_AST

Only syntactically valid but semantically incorrect SQLs are retained as robust negative samples, each optionally annotated for “wrong sub-SQL” indices.

3. Mathematical Formulation of Augmentation Sampling

The sub-SQL augmentation process can be formalized as follows:

Node sampling in the LP graph: $P(i\,|\,s^+) = \frac{1}{|V_{\mathrm{LP}}|}$
Sub-tree root selection in AST: $P(j\,|\,A_i) = \frac{1}{|V_{A_i}|}$
Rule selection from transformation set $T$ : $P(T_k) = \frac{1}{K}$
The probability of producing perturbed AST $A_i'$ :

$P(A_i' | s^+) = \sum_{i}\sum_{j}\sum_{k} P(i|s^+) P(j|A_i) P(T_k) \mathbb{I}\left[ A_i' = T_k(A_i \!\downarrow\! j) \right]$

where $A_i\!\downarrow\!j$ denotes the AST sub-tree rooted at $j$ , and $\mathbb{I}$ is the indicator function.

4. Nested Message Passing Neural Network Architecture

HEROSQL leverages a Nested Message Passing Neural Network (NMPNN) to propagate semantic information across hierarchical SQL structures:

Training Dataset Construction: $D_\text{train} = D_\text{gold} \cup D_\text{LLM} \cup D_\text{AST}$ , with binary labels $y$ distinguishing correct ( $y=0$ ) and semantically erroneous ( $y=1$ ) SQLs. Sub-SQL labels $y_i$ are optionally set to $1$ if the LP node’s AST is perturbed.
NMPNN Forward Pass:

AST-level message passing for each $A_i$ yields sub-SQL embedding $h_i$ .
LP-level message passing aggregates $\{h_i\}$ to produce a holistic SQL embedding $h_\text{SQL}$ .
Fusion with query embedding $h_q$ (Hadamard product and concatenation) results in final representation $x$ .
Multilayer perceptron computes $\hat{y} \in [0,1]$ , estimating the probability of semantic error.

Loss Functions:
- Query-level binary cross-entropy:
$L_{\text{query}} = -\bigl[ y\log\hat{y} + (1-y)\log(1-\hat{y}) \bigr]$ - Optional fine-grained sub-SQL supervision:

$L_{\text{sub}} = \sum_{i \in V_{\mathrm{LP}}} \left[ -y_i\log\hat{y}_i - (1-y_i)\log(1-\hat{y}_i) \right]$ - Total loss combines both:

$L = L_{\text{query}} + \lambda\,L_{\text{sub}}$

with $\lambda$ determining local feedback weight.

5. Experimental Results and Effectiveness

HEROSQL’s empirical evaluation on Text-to-SQL semantic validation benchmarks (notably BIRD and Spider) reveals substantial improvements over state-of-the-art methods:

With Qwen3-0.6B backbone and AST-driven augmentation (NDA):
- BIRD: AUPRC, $62.70 \rightarrow 67.39$ (+4.69 ppt); AUROC, $59.25 \rightarrow 61.51$
- Spider: AUPRC, $48.91 \rightarrow 51.92$ (+3.01 ppt); AUROC, $59.97 \rightarrow 67.32$
Overall, a mean relative boost of $9.40\%$ for AUPRC and $12.35\%$ for AUROC is achieved.
Ablation (“w/o NDA”) demonstrates marked degradation when AST-based negative sample generation is omitted, confirming its role in fine-grained semantic validation (Qiu et al., 28 Dec 2025).

6. Practical Implications and Significance

HEROSQL’s hierarchical representation, combined with AST-driven augmentation and NMPNN training, systematically generates challenging, syntactically valid but semantically incorrect SQL statements. This enables the detection of subtle semantic mismatches at both query and sub-query granularity, enhances feedback for LLMs, and promotes increased reliability and interpretability of Text-to-SQL systems. A plausible implication is the advancement of validation strategies that go beyond mere syntactic analysis, supporting robust and scalable semantic assurance in automated data querying.

Markdown Upgrade to Chat

References (1)

Bridging Global Intent with Local Details: A Hierarchical Representation Approach for Semantic Validation in Text-to-SQL (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to HEROSQL.