Papers
Topics
Authors
Recent
2000 character limit reached

Confusion-Aware Noisy Schema Sampling

Updated 3 January 2026
  • The paper demonstrates that injecting model-selected confusing schema elements into training significantly improves robustness in schema linking for text-to-SQL, achieving state-of-the-art performance.
  • The method utilizes dynamically computed confusion scores and stochastic sampling to select negative candidates, thereby bridging the gap between ideal training data and noisy inference environments.
  • The approach generalizes to recommender systems by effectively distinguishing hard negatives from truly irrelevant ones, which enhances overall recommendation quality.

Confusion-Aware Noisy Schema Sampling (NSS) is a discriminative training technique that injects dynamically selected noisy schema elements or negative candidates into supervised learning in order to close the robustness gap that arises from false-positive link predictions or noisy labels. Developed initially in the context of text-to-SQL with LLMs and subsequently adapted for recommender systems, NSS leverages model-driven "confusion scores" to sample distractor elements that the model itself currently finds hard to disambiguate. By explicitly teaching the model to ignore such confounding signals, NSS substantially improves resilience to schema linking and negative sampling errors that dominate train–inference discrepancies (Song et al., 20 May 2025, Song et al., 10 Nov 2025).

1. Motivation and Problem Context

In tasks such as text-to-SQL, fine-tuned LLM pipelines consist of two primary modules: (i) schema linking, which selects tables/columns from the database likely relevant to the user's natural language question, and (ii) SQL generation, which emits a query conditioned on the question plus the selected schema subset. Achieving high recall in schema linking typically involves using permissive thresholds, leading to numerous false-positive (irrelevant) schema items during inference. When the SQL generator is trained only with perfectly curated, ground-truth schema, this mismatch results in a pronounced susceptibility to spurious elements at test time.

Confusion-Aware NSS addresses this robustness gap by stochastically injecting precisely those negative schema items which the current model is most prone to misclassify as relevant, thereby teaching the SQL generator to remain robust under the types of noise encountered at inference (Song et al., 20 May 2025). In recommender systems, a similar issue occurs in negative sampling: false negatives (truly relevant but unobserved user-item pairs) and true noisy labels become conflated, reducing recommendation quality unless separated using auxiliary signals (Song et al., 10 Nov 2025).

2. Formalization and Sampling Mechanism

Given a set of schema elements SS for a database, let S1S_1 denote the ground-truth relevant subset and S0=S∖S1S_0 = S \setminus S_1 the complement pool of potential noise. For each item ii, the model computes a confusion score y^i=σ(Whi)\hat{y}_i = \sigma(W h_i), where hih_i is the hidden state at a designated marker token and WW is a learned linear head. The confusion score represents the model's current belief in the relevance of schema item ii (Song et al., 20 May 2025).

NSS then samples a random budget k∼Uniform(0,⌊β∣S∣⌋)k \sim \text{Uniform}(0, \lfloor \beta |S| \rfloor) to select up to kk noisy elements from S0S_0 without replacement, with sampling probability:

P(i)=y^i∑j∈S0y^j,i∈S0.P(i) = \frac{\hat{y}_i}{\sum_{j \in S_0} \hat{y}_j}, \quad i \in S_0.

For recommender systems, a comparable approach leverages LLM-based semantic relevance. User and item profiles are encoded into high-dimensional embeddings, projected via a learned MLP, and the cosine similarity is used to generate a softmax sampling distribution for negative candidate selection. This auxiliary signal is used to separate hard-from-noisy negatives, with the "hard negative" for user uu selected as the candidate j∗j^* of minimal semantic similarity among the top high-scoring distractors (Song et al., 10 Nov 2025).

3. Integration with Selective Attention and Joint Loss

In the JOLT-SQL system, during SQL generation training, the standard autoregressive attention mask is modified so that each query token attends only to: (i) the prompt prefix, (ii) all ground-truth schema tokens, (iii) all tokens belonging to the sampled noisy schema subset, and (iv) earlier tokens in the query (causal masking). Marker tokens are excluded from this path. The loss is jointly optimized as:

L=LSL+LNTPL = L_{\mathrm{SL}} + L_{\mathrm{NTP}}

Where LSLL_{\mathrm{SL}} is the schema-linking loss and LNTPL_{\mathrm{NTP}} is the next-token prediction loss, with LNTPL_{\mathrm{NTP}}'s context restricted by the attention mask defined above (Song et al., 20 May 2025). The entire NSS effect in JOLT-SQL is implemented via this attention control during SQL generation.

In LLM-based recommender systems, the sampled negative j∗j^* is directly used in the BPR pairwise ranking loss, while both semantic and logical relevance from LLM signals contribute to pruning/augmenting edges in the user–item interaction graph. Cross-graph contrastive learning further enforces feature stability across hallucinated or pruned interactions, robustifying against LLM-induced noise or hallucination errors (Song et al., 10 Nov 2025).

4. Architectural and Algorithmic Refinements

NSS in JOLT-SQL is enabled by an architectural tweak: within the schema block, local bidirectional attention (LBA) is applied, allowing non-marker schema tokens to attend to each other and marker tokens to see all schema definitions, overcoming the limitations of pure causal masking. This change enables more accurate computation of linking scores for each schema element, increasing both recall and discriminative precision while keeping sequence lengths manageable (Song et al., 20 May 2025).

The sampling and training workflow consists of an initial epoch to cache model confusion scores, followed by each training epoch proceeding through: (i) schema linking and loss computation, (ii) confusion-aware noisy schema sampling, (iii) SQL generation with selective attention, and (iv) joint parameter update (Song et al., 20 May 2025).

In recommender systems, multiple contrastive learning objectives are introduced: (i) objective alignment (projecting LLM embeddings for optimal semantic match), (ii) cross-graph denoising (aligning user–item representations across original and LLM-pruned/augmented graphs), and (iii) hallucination-robust contrastive loss (random edge dropout) to prevent overfitting to unreliable LLM-inferred interactions (Song et al., 10 Nov 2025).

5. Empirical Performance and Ablation

JOLT-SQL with Confusion-Aware NSS, evaluated using Qwen-2.5-Coder-14B, achieves Spider Dev/Test EX = 88.4%/88.9% and BIRD Dev EX = 64.9%, constituting state-of-the-art execution accuracy among comparably sized open-source models. Ablation experiments on Qwen-7B demonstrate that removing NSS drops Spider EX by 0.9 points and BIRD EX by 1.8 points, indicating that NSS is a critical driver of robustness. Removal of schema-selective attention and LBA further reduce performance, confirming the necessity of all three mechanisms (Song et al., 20 May 2025).

In large-scale recommender systems, confusion-aware NSS, realized via semantic and logical relevance-based filtering, yields significant improvements in denoising and downstream recommendation quality. A plausible implication is that NSS generalizes effectively to domains beyond relational database question answering, provided suitable confusion or auxiliary relevance signals are available (Song et al., 10 Nov 2025).

6. Illustrative Example and Practical Significance

A pedagogical example from JOLT-SQL considers a schema with columns address_id, line_1, line_2, line_3 for the question "Which address holds the most number of students currently? List the address_id and all lines." line_3 is a frequent false positive yet not ground-truth relevant. During NSS training, if the model’s confusion score for line_3 is high, it is likely to be sampled into the noisy schema subset, and the SQL generator learns to ignore it when generating the correct SQL. At inference, even if low-threshold linkers include line_3, the trained model robustly omits it, eliminating spurious joins or projections (Song et al., 20 May 2025).

In recommender systems, the analogous process involves distinguishing hard samples (vital for capturing diverse user preference) from truly noisy negatives, using LLM semantic and logical assessment; highs in semantic similarity among top-scoring negatives are filtered out, minimizing false-negative risk while increasing the informativeness of negative sampling (Song et al., 10 Nov 2025).

7. Cross-Domain Generalization and Research Trajectory

The core insight behind Confusion-Aware NSS—dynamic, model-driven exposure to high-confusion distractors—generalizes across tasks involving structured schemas, negative sampling, and noisy label selection, provided the existence of a confusion signal either learned in situ or via auxiliary models (e.g., LLMs). While its most detailed development is in text-to-SQL (JOLT-SQL), adaptation to recommendation and possible further extension to graph learning and information retrieval are direct. Ongoing research may explore alternative confusion metrics, active learning-driven negative sampling, and tighter integration with contrastive alignment objectives for further robustness gains (Song et al., 20 May 2025, Song et al., 10 Nov 2025).

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Confusion-Aware Noisy Schema Sampling (NSS).