LASER: A Data-Centric Method for Low-Cost and Efficient SQL Rewriting based on SQL-GRPO

Published 8 Apr 2026 in cs.DB | (2604.06804v1)

Abstract: Query rewriting, the process of transforming queries into semantically equivalent yet more efficient variants, is crucial for database optimization. Existing solutions predominantly rely on either rule-based heuristics or LLMs. However, traditional rule-based methods lack adaptability, while LLM-based approaches incur prohibitive inference costs and privacy risks. In contrast, Small LLMs (SLMs) present a compelling middle ground, potentially offering both flexibility and efficiency. However, the development of such compact models is severely bottlenecked by the scarcity of high-quality, domain-specific training data. To bridge this gap, we introduce LASER, a data-centric framework designed to empower small models for robust SQL optimization. First, to address the scarcity of existing benchmarks and the limited optimization headroom of generic synthetic queries, we construct SQL-MCTS, a large-scale corpus of complex slow queries. We employ an MCTS-based hybrid expansion strategy that combines rule-guided anti-patterns with LLM mutations to evolve structurally expressive seeds into execution-verified slow variants. Second, to enable the model to autonomously discover latency-aware rewriting patterns, we propose SQL-GRPO, a specialized alignment strategy adapted from Group Relative Policy Optimization. By integrating Anchored Group Advantage to refine advantage estimation and Complexity-Adaptive Dynamic Rollout to efficiently allocate exploration budgets, this approach effectively empowers compact models to master execution-based optimization logic. Implemented on Qwen3 models, LASER significantly outperforms rule-based systems and LLMs in execution efficiency, while exhibiting robust zero-shot transferability with minimal overhead.

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper introduces LASER, which integrates SQL-MCTS for generating challenging slow queries and SQL-GRPO for RL-based optimization.
It demonstrates substantial latency reduction—from 69.73s to 13.03s—and cost efficiency using compact models locally deployable with 32-40GB VRAM.
The methodology enhances query performance via complexity-adaptive rollouts, Anchored Group Advantage, and verification-driven self-correction.

LASER: A Data-Centric Framework for Efficient SQL Query Rewriting Based on SQL-GRPO

Motivation and Background

The paper presents LASER, a data-centric approach targeting efficient SQL rewriting for performance optimization in DBMSs, specifically leveraging compact Small LLMs (SLMs). Traditional query rewriting approaches rely on either rule-based heuristics, which are brittle and lack adaptivity, or LLM-driven workflows, which introduce prohibitive inference costs, latency, and severe privacy concerns—especially when proprietary APIs handle sensitive SQL and schema data. The Chinchilla scaling law enforces the paradigm that high-quality, diverse data is more crucial than sheer model size for downstream performance in domain-specific settings. However, training compact models for SQL optimization is hampered by the lack of high-quality, structurally complex slow-query benchmarks and ineffective RL-based optimization pipelines.

System Architecture

LASER synergistically unifies two major modules: slow query generation via SQL-MCTS and SQL query optimization policy training via SQL-GRPO.

Figure 1: LASER framework architecture integrating slow-query generation and GRPO-enhanced optimization policy training.

Slow Query Generator: SQL-MCTS

SQL-MCTS systematically constructs a large corpus of semantically equivalent but computationally degraded SQL queries, which serve as challenging benchmarks for training optimizers. Seed queries are initialized by injecting structural constraints to ensure multi-table joins, deep nesting, and dense predicates, explicitly avoiding bias towards trivial, shallow queries. Subsequent expansion utilizes a hybrid MCTS-based workflow: anti-patterns derived from reversed optimizer rules and free-form LLM-based mutations expand the search space. Each step is grounded by execution feedback for reward assignment, including strict semantic equivalence via sorted result-set hashes and structural novelty via AST-based tree edit distance metrics.

Figure 2: Workflow of MCTS-Driven Slow Query Generation with rule-guided and LLM-based expansion, and execution-based reward feedback.

SQL-MCTS delivers a training corpus with 11,675 slow queries, each exhibiting execution latency at least twice the reference, filtered for non-trivial structural complexity and semantic coherence, validated via DeepSeek-V3.

Reinforcement Learning Pipeline: SQL-GRPO

LASER initiates optimization with Supervised Fine-Tuning (SFT) on reasoning traces generated by the DeepSeek-R1 model, distilling domain-specific optimization knowledge into the policy. Subsequently, Group Relative Policy Optimization (GRPO) aligns the policy with latency reduction objectives via environment feedback, but with bespoke modifications addressing SQL workload heterogeneity:

Complexity-Adaptive Dynamic Rollout

Rollout budgets adapt on-the-fly based on pilot entropy and reward variance (proxy for query complexity and optimization headroom), ensuring sparse resources focus on challenging queries rather than saturating trivial ones. Empirical results (Figure 3) demonstrate that naive uniform rollout leads to wasted compute on simple queries and insufficient exploration for complex cases.

Figure 3: Saturation analysis of rollout budgeting—complex queries demand high exploration budgets for valid optimization.

Anchored Group Advantage

Standard GRPO's groupwise Z-score normalization is vulnerable to inflated reward signals if valid but suboptimal rewrites emerge against a group of invalid candidates. Anchored Group Advantage fuses relative ranking with absolute performance baselines, suppressing false positives and amplifying reward signals for genuine latency breakthroughs. This prevents reinforcement of degradation patterns and improves stability, as elucidated in Figure 4.

Figure 4: Advantage estimation—Anchored Group Advantage resolves misleading parity between mediocre survivors and genuine breakthroughs.

Verification-Driven Self-Correction

Inference incorporates a closed-loop verification workflow: candidates are first checked via EXPLAIN for syntax integrity before execution. Failures trigger diagnostic feedback loops, enabling self-correction and increasing robustness against parse errors and semantic ambiguity.

Experimental Evaluation

Data Quality and Human Evaluation

SQL-MCTS is benchmarked against TPC-DS, TPC-H, DSB, Calcite, and SQL-Factory. Human evaluation confirms SQL-MCTS queries significantly surpass synthetic benchmarks in optimization non-triviality and structural depth, without sacrificing semantic coherence.

Performance Metrics

LASER-14B achieves a mean execution time reduction from 69.73s (original) to 13.03s on SQL-MCTS, with 90% equivalence, outperforming DeepSeek-R1, GPT-4o, Qwen3, and all rule-based systems, both in average latency and equivalence across multiple benchmarks.

Cost and Localizability

The LASER models (8B/14B) achieve local deployment, requiring only 32–40GB VRAM, dramatically reducing cost/latency compared to LLM-based and rule-based API workflows that demand >1TB VRAM and incur substantial API call costs (Table: DSB benchmark).

Generalization and Cross-DB Transfer

LASER-trained models generalize robustly to unseen schemas (TPC-H, Calcite) and other database engines (MySQL), achieving state-of-the-art latency reduction and maintaining equivalence without retraining, validating SQL pattern transferability.

Ablation Results

Ablation confirms each component's necessity. Removing SFT, Dynamic Rollout, or Anchored Group Advantage noticeably degrades latency and equivalence. Naive GRPO (E $^3$ -Rewrite) fails to handle sparse, complex queries, especially at the 95th percentile. Verification-driven self-correction impacts robustness.

Case Study

LASER successfully eliminates computational redundancy, such as duplicate aggregations in deeply nested queries, by restructuring via Common Subexpression Elimination with CTEs, reducing expensive table scans and overall execution cost.

Figure 5: Example generated slow queries illustrating structural degeneration for optimization training.

Real-world Deployment

In ByteDance enterprise deployment, LASER-14B rewrote 69 production queries; 73% yielded latency reduction, averaging 23% speedup, demonstrating industrial applicability for diverse business scenarios.

Implications and Future Directions

LASER illustrates that with high-quality, structurally challenging data and tailored RL policy optimization (notably GRPO with complexity-Adaptive rollout and anchored advantage), compact SLMs can achieve DBMS-level optimization traditionally reserved for large, resource-heavy models. Local deployability enables privacy-compliant, real-time optimization for production environments. The framework's robust transferability across benchmarks and engines reinforces the utility of data-centric pipelines over parameter-centric scaling, aligning with broader trends in domain-adapted LLM deployment.

Potential future directions include expanding the domain to other SQL dialects and heterogeneous DBMSs, integrating LASER with auto-indexing/subschema optimization, and exploring its synergy with agentic RL pipelines and interpretable SQL optimization rationales.

Conclusion

LASER presents a unified methodology for data-driven, cost-efficient SQL rewrite optimization, leveraging novel slow-query generation and advanced GRPO-based training to enable compact models that exhibit superior latency reduction and equivalence. Distinct results in experimental and industrial settings validate LASER's effectiveness, scalability, and practical transferability for real-world DBMS optimization scenarios (2604.06804).

Markdown Report Issue