HEROSQL: Hierarchical SQL Validation
- HEROSQL is a hierarchical methodology that integrates Logical Plan and AST to validate semantic alignment in Text-to-SQL queries, capturing both global intent and local details.
- It employs an AST-driven augmentation strategy to generate syntactically valid yet semantically erroneous SQL samples, enhancing error detection.
- Using a Nested Message Passing Neural Network, HEROSQL achieves notable performance improvements on benchmarks, boosting both AUPRC and AUROC metrics.
HEROSQL is a hierarchical representation methodology for semantic validation in Text-to-SQL systems, developed to address the limitations of prior approaches that focused predominantly on syntactic correctness without robust detection of semantic misalignments between natural language questions and their generated SQL statements. By bridging both global intent and local SQL details, HEROSQL enables reliable validation, improved interpretability, and more granular feedback mechanisms for data querying platforms (Qiu et al., 28 Dec 2025).
1. Hierarchical SQL Representation: Logical Plan and AST Integration
HEROSQL introduces a dual-level structure for SQL representation:
- The Logical Plan (LP) $\mathsfit{LP} = (V_{\mathrm{LP}}, E_{\mathrm{LP}})$ is derived from a gold SQL query using an optimizer (e.g., Apache Calcite). Each node contains an operator (such as Filter, Join, Aggregate) and a text attribute representing a sub-SQL fragment .
- Abstract Syntax Trees (ASTs) are constructed for each sub-SQL fragment , where AST nodes encode atomic tokens or syntactic types , and edges capture parent-child nesting (e.g., binary comparisons, column references).
This hierarchical structure unifies coarse-grained intent and fine-grained syntactic details, allowing the semantic validator to detect subtle discrepancies.
2. AST-Driven Sub-SQL Augmentation Strategy
To optimize semantic validation, HEROSQL employs an AST-driven augmentation pipeline for negative sample generation:
- Formalization: Each sub-SQL is parsed into an AST, e.g., "age > 30 AND salary < 100000" forms an AST rooted with comparison operators ">", "<", and leaves corresponding to tokens ("age", "30", "salary", "100000").
- Sample Generation Pseudocode:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
Input: D_gold = { (q, s⁺) }, T = {T₁, ..., T_K}
Output: D_AST = { (q, s⁻) }
D_AST ← ∅
for each (q, s⁺) in D_gold:
LP ← Optimize(s⁺)
for each node v_i^{LP} in LP.V:
A_i ← ParseAST(v_i^{LP}.attribute)
for each transformation rule T_k in T:
A_i' ← T_k(A_i)
s_candidate ← ReconstructSQL(LP with A_i → A_i')
if syntacticallyValid(s_candidate) and Exec(s_candidate) ≠ Exec(s⁺):
D_AST ← D_AST ∪ { (q, s_candidate) }
return D_AST |
- Only syntactically valid but semantically incorrect SQLs are retained as robust negative samples, each optionally annotated for “wrong sub-SQL” indices.
3. Mathematical Formulation of Augmentation Sampling
The sub-SQL augmentation process can be formalized as follows:
- Node sampling in the LP graph:
- Sub-tree root selection in AST:
- Rule selection from transformation set :
- The probability of producing perturbed AST :
where denotes the AST sub-tree rooted at , and is the indicator function.
4. Nested Message Passing Neural Network Architecture
HEROSQL leverages a Nested Message Passing Neural Network (NMPNN) to propagate semantic information across hierarchical SQL structures:
- Training Dataset Construction: , with binary labels distinguishing correct () and semantically erroneous () SQLs. Sub-SQL labels are optionally set to $1$ if the LP node’s AST is perturbed.
- NMPNN Forward Pass:
- AST-level message passing for each yields sub-SQL embedding .
- LP-level message passing aggregates to produce a holistic SQL embedding .
- Fusion with query embedding (Hadamard product and concatenation) results in final representation .
- Multilayer perceptron computes , estimating the probability of semantic error.
- Loss Functions:
- Query-level binary cross-entropy:
- Optional fine-grained sub-SQL supervision:
- Total loss combines both:
with determining local feedback weight.
5. Experimental Results and Effectiveness
HEROSQL’s empirical evaluation on Text-to-SQL semantic validation benchmarks (notably BIRD and Spider) reveals substantial improvements over state-of-the-art methods:
With Qwen3-0.6B backbone and AST-driven augmentation (NDA):
- BIRD: AUPRC, (+4.69 ppt); AUROC,
- Spider: AUPRC, (+3.01 ppt); AUROC,
- Overall, a mean relative boost of for AUPRC and for AUROC is achieved.
- Ablation (“w/o NDA”) demonstrates marked degradation when AST-based negative sample generation is omitted, confirming its role in fine-grained semantic validation (Qiu et al., 28 Dec 2025).
6. Practical Implications and Significance
HEROSQL’s hierarchical representation, combined with AST-driven augmentation and NMPNN training, systematically generates challenging, syntactically valid but semantically incorrect SQL statements. This enables the detection of subtle semantic mismatches at both query and sub-query granularity, enhances feedback for LLMs, and promotes increased reliability and interpretability of Text-to-SQL systems. A plausible implication is the advancement of validation strategies that go beyond mere syntactic analysis, supporting robust and scalable semantic assurance in automated data querying.