Papers
Topics
Authors
Recent
Search
2000 character limit reached

HEROSQL: Hierarchical SQL Validation

Updated 4 January 2026
  • HEROSQL is a hierarchical methodology that integrates Logical Plan and AST to validate semantic alignment in Text-to-SQL queries, capturing both global intent and local details.
  • It employs an AST-driven augmentation strategy to generate syntactically valid yet semantically erroneous SQL samples, enhancing error detection.
  • Using a Nested Message Passing Neural Network, HEROSQL achieves notable performance improvements on benchmarks, boosting both AUPRC and AUROC metrics.

HEROSQL is a hierarchical representation methodology for semantic validation in Text-to-SQL systems, developed to address the limitations of prior approaches that focused predominantly on syntactic correctness without robust detection of semantic misalignments between natural language questions and their generated SQL statements. By bridging both global intent and local SQL details, HEROSQL enables reliable validation, improved interpretability, and more granular feedback mechanisms for data querying platforms (Qiu et al., 28 Dec 2025).

1. Hierarchical SQL Representation: Logical Plan and AST Integration

HEROSQL introduces a dual-level structure for SQL representation:

  • The Logical Plan (LP) $\mathsfit{LP} = (V_{\mathrm{LP}}, E_{\mathrm{LP}})$ is derived from a gold SQL query s+s^+ using an optimizer (e.g., Apache Calcite). Each node viLPVLPv_i^{\mathrm{LP}} \in V_{\mathrm{LP}} contains an operator oio_i (such as Filter, Join, Aggregate) and a text attribute aia_i representing a sub-SQL fragment sis_i.
  • Abstract Syntax Trees (ASTs) Ai=(VAi,EAi)\mathcal{A}_i = (V_{A_i}, E_{A_i}) are constructed for each sub-SQL fragment sis_i, where AST nodes encode atomic tokens or syntactic types cjAic_j^{A_i}, and edges capture parent-child nesting (e.g., binary comparisons, column references).

This hierarchical structure unifies coarse-grained intent and fine-grained syntactic details, allowing the semantic validator to detect subtle discrepancies.

2. AST-Driven Sub-SQL Augmentation Strategy

To optimize semantic validation, HEROSQL employs an AST-driven augmentation pipeline for negative sample generation:

  • Formalization: Each sub-SQL aia_i is parsed into an AST, e.g., si=s_i = "age > 30 AND salary < 100000" forms an AST rooted with comparison operators ">", "<", and leaves corresponding to tokens ("age", "30", "salary", "100000").
  • Sample Generation Pseudocode:

1
2
3
4
5
6
7
8
9
10
11
12
13
Input: D_gold = { (q, s) }, T = {T, ..., T_K}
Output: D_AST = { (q, s) }
D_AST  
for each (q, s) in D_gold:
    LP  Optimize(s)
    for each node v_i^{LP} in LP.V:
        A_i  ParseAST(v_i^{LP}.attribute)
        for each transformation rule T_k in T:
            A_i' ← T_k(A_i)
            s_candidate  ReconstructSQL(LP with A_i  A_i')
            if syntacticallyValid(s_candidate) and Exec(s_candidate)  Exec(s):
                D_AST  D_AST  { (q, s_candidate) }
return D_AST

  • Only syntactically valid but semantically incorrect SQLs are retained as robust negative samples, each optionally annotated for “wrong sub-SQL” indices.

3. Mathematical Formulation of Augmentation Sampling

The sub-SQL augmentation process can be formalized as follows:

  • Node sampling in the LP graph: P(is+)=1VLPP(i\,|\,s^+) = \frac{1}{|V_{\mathrm{LP}}|}
  • Sub-tree root selection in AST: P(jAi)=1VAiP(j\,|\,A_i) = \frac{1}{|V_{A_i}|}
  • Rule selection from transformation set TT: P(Tk)=1KP(T_k) = \frac{1}{K}
  • The probability of producing perturbed AST AiA_i':

P(Ais+)=ijkP(is+)P(jAi)P(Tk)I[Ai=Tk(Ai ⁣ ⁣j)]P(A_i' | s^+) = \sum_{i}\sum_{j}\sum_{k} P(i|s^+) P(j|A_i) P(T_k) \mathbb{I}\left[ A_i' = T_k(A_i \!\downarrow\! j) \right]

where Ai ⁣ ⁣jA_i\!\downarrow\!j denotes the AST sub-tree rooted at jj, and I\mathbb{I} is the indicator function.

4. Nested Message Passing Neural Network Architecture

HEROSQL leverages a Nested Message Passing Neural Network (NMPNN) to propagate semantic information across hierarchical SQL structures:

  • Training Dataset Construction: Dtrain=DgoldDLLMDASTD_\text{train} = D_\text{gold} \cup D_\text{LLM} \cup D_\text{AST}, with binary labels yy distinguishing correct (y=0y=0) and semantically erroneous (y=1y=1) SQLs. Sub-SQL labels yiy_i are optionally set to $1$ if the LP node’s AST is perturbed.
  • NMPNN Forward Pass:
  1. AST-level message passing for each AiA_i yields sub-SQL embedding hih_i.
  2. LP-level message passing aggregates {hi}\{h_i\} to produce a holistic SQL embedding hSQLh_\text{SQL}.
  3. Fusion with query embedding hqh_q (Hadamard product and concatenation) results in final representation xx.
  4. Multilayer perceptron computes y^[0,1]\hat{y} \in [0,1], estimating the probability of semantic error.
  • Loss Functions:
    • Query-level binary cross-entropy:

    Lquery=[ylogy^+(1y)log(1y^)]L_{\text{query}} = -\bigl[ y\log\hat{y} + (1-y)\log(1-\hat{y}) \bigr] - Optional fine-grained sub-SQL supervision:

    Lsub=iVLP[yilogy^i(1yi)log(1y^i)]L_{\text{sub}} = \sum_{i \in V_{\mathrm{LP}}} \left[ -y_i\log\hat{y}_i - (1-y_i)\log(1-\hat{y}_i) \right] - Total loss combines both:

    L=Lquery+λLsubL = L_{\text{query}} + \lambda\,L_{\text{sub}}

with λ\lambda determining local feedback weight.

5. Experimental Results and Effectiveness

HEROSQL’s empirical evaluation on Text-to-SQL semantic validation benchmarks (notably BIRD and Spider) reveals substantial improvements over state-of-the-art methods:

  • With Qwen3-0.6B backbone and AST-driven augmentation (NDA):

    • BIRD: AUPRC, 62.7067.3962.70 \rightarrow 67.39 (+4.69 ppt); AUROC, 59.2561.5159.25 \rightarrow 61.51
    • Spider: AUPRC, 48.9151.9248.91 \rightarrow 51.92 (+3.01 ppt); AUROC, 59.9767.3259.97 \rightarrow 67.32
  • Overall, a mean relative boost of 9.40%9.40\% for AUPRC and 12.35%12.35\% for AUROC is achieved.
  • Ablation (“w/o NDA”) demonstrates marked degradation when AST-based negative sample generation is omitted, confirming its role in fine-grained semantic validation (Qiu et al., 28 Dec 2025).

6. Practical Implications and Significance

HEROSQL’s hierarchical representation, combined with AST-driven augmentation and NMPNN training, systematically generates challenging, syntactically valid but semantically incorrect SQL statements. This enables the detection of subtle semantic mismatches at both query and sub-query granularity, enhances feedback for LLMs, and promotes increased reliability and interpretability of Text-to-SQL systems. A plausible implication is the advancement of validation strategies that go beyond mere syntactic analysis, supporting robust and scalable semantic assurance in automated data querying.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to HEROSQL.