QueryGym: Interactive Query Planning

Updated 2 October 2025

QueryGym is an interactive environment for developing and evaluating LLM-based query planning agents in relational databases using explicit relational algebra steps.
It provides detailed schema metadata, intermediate computation states, and error feedback to support incremental error remediation and transparent agent reasoning.
Its engine-agnostic design and reinforcement learning framework enable robust cross-engine query generation and systematic research on planning efficiency.

QueryGym is an interactive environment designed for the development, testing, and evaluation of LLM-based query planning agents within the relational database context. Unlike prior frameworks that tether agents to specific SQL dialects or conceal the decision-making process within monolithic query generation, QueryGym enforces an engine-agnostic and transparent query construction process where agents must explicitly compose a sequence of relational algebra operations. Implemented as a Gymnasium interface, QueryGym delivers contextual observations—including schema metadata, intermediate computation states, and granular execution feedback—and accepts structured actions that model both database exploration and relational operation execution. This architecture enables fine-grained paper of agent reasoning, facilitates incremental error remediation, and provides a practical platform for reinforcement learning research in query generation (Ananthakrishanan et al., 25 Sep 2025).

1. Motivation and Architectural Principles

QueryGym addresses the core deficiencies of traditional natural language-to-query (NL2Query) systems. Conventional systems typically adopt a single-shot, sequence-to-sequence mapping from natural language to SQL, which:

Obscures the agent's intermediate reasoning,
Couples the agent tightly to the SQL dialect of a particular engine,
Impedes systematic error correction and interpretability, and
Limits the applicability of reinforcement learning due to sparse or terminal reward signals.

To overcome these issues, QueryGym requires agents to construct queries as explicit, step-by-step plans in relational algebra rather than monolithic SQL strings. The environment is formalized as a partially observable Markov Decision Process (POMDP) tuple $(S, A, \Omega, T, R)$ where:

$S$ denotes the environment state (including the schema, question, intermediate tables/CTEs),
$A$ is the set of permitted actions (database exploration + algebraic operations),
$\Omega$ is the set of available observations (schema info, previews, execution feedback, error reports),
$T$ is the transition function (actions deterministically update the environment state or return information),
$R$ is the reward function (providing strong rewards for terminal correctness and incremental feedback for partial progress).

By requiring agents to select from a defined action space and revealing full or partial state at each step, QueryGym creates a transparent and reproducible protocol for query planning that is independent of backend SQL dialects.

2. Environment Functionality and Interaction Protocol

QueryGym's environment is centered on turn-by-turn agent interactions, mimicking the stepwise construction typical of human database analysts:

Observations: At every interaction, the agent receives a textual observation in one of several forms:
- Overview: Current schema details with the natural language query.
- Exploration Result: Results from probe actions (e.g., table previews, sample column values).
- Intermediate CTE Info: Outputs of the current or prior relational algebra steps (such as projection, filter, join).
- Error Feedback: Detailed error messages on invalid actions (e.g., projection of a nonexistent column).
Actions: The actionable space combines database exploration with algebraic operations:
- Exploration: 12 distinct operations (such as GET_SCHEMA, PREVIEW_TABLE, GET_COLUMN_STATS) for disambiguation and interactive schema probing.
- Relational Algebra: 8 canonical operations (PERFORM_PROJECTION, PERFORM_FILTER, PERFORM_JOIN, PERFORM_UNION, etc.) representing $\pi$ , $\sigma$ , $\bowtie$ , and set-based constructs.

This explicit action-observation loop allows agents to iteratively probe the database, issue corrective operations in response to errors, and build queries as modular plans rather than static strings.

3. Step-by-Step Query Construction and Transparency

A distinctive principle of QueryGym is enforcing transparent, modular query planning:

Each agent action reflects a single relational operation or exploration probe; intermediate results (such as CTEs or table fragments) are materialized and visible to the agent.
Intermediate steps (e.g., $\sigma_{condition}(R)$ for filters, $\pi_{columns}(R)$ for projections, $R \bowtie S$ for joins) are recorded, enabling inspection and retrospective analysis.
Error conditions, including incompatible joins, ambiguous columns, or type mismatches, are surfaced explicitly as feedback, allowing the agent to correct errors at the operation level.

This approach is contrasted with black-box NL2SQL methods, where a single invalid clause (e.g., referencing a non-existent column) can make the entire output unusable and substantially complicates debugging. QueryGym's incremental design makes error remediation tractable and process-driven.

4. Engine-Agnostic Design and Reinforcement Learning Compatibility

QueryGym is intentionally engine-agnostic, relying on relational algebra as the intermediate representation rather than mapping directly to SQL dialects. This design has several advantages:

Portability: Query plans constructed in relational algebra can be interpreted by any SQL-compatible engine or further transpiled into the dialect of a target database (e.g., SQLite, PostgreSQL).
RL Support: The explicit state-action-reward formalism and incremental reward feedback facilitate the application of reinforcement learning algorithms. Agents may receive partial credit for producing outputs whose rows are a superset or subset of the true answer and can optimize for not only correctness but planning efficiency.
Exploration over Schema Linking: Instead of requiring agents to perform upfront schema linking or full natural language schema mapping, the environment encourages interactive schema exploration through probing actions. This empirically lowers ambiguity in real-world queries (such as disambiguating among multiple "date" fields) (Ananthakrishanan et al., 25 Sep 2025).

5. Applications in Query Generation, Error Remediation, and Research

QueryGym provides a structured testbed for several research directions and practical use cases:

Error Remediation: Agents can iteratively refine and correct their queries, using the immediate feedback loop to diagnose and address specific faults.
Transparency and Explainability: The breakdown of the entire planning trajectory into discrete algebraic operations affords unprecedented transparency, supporting research into interpretable agent reasoning or targeted debugging.
Reinforcement Learning Research: The environment’s structured reward protocol and incremental feedback make it well-suited for the training and benchmarking of RL-based query generation agents—enabling the paper of planning, exploration, and credit assignment in complex, real-world database querying scenarios.
Generalization and Cross-Engine Utility: The abstraction away from concrete SQL syntax ensures that algorithms developed within QueryGym generalize across heterogeneous database engines.

6. Demonstration Insights and Practical Utility

The QueryGym demonstration highlights several practical benefits:

The exploration interface allows users to select example queries, inspect schema structure, and interactively apply exploratory and relational operations.
A comparison with black-box LLM-based SQL generation demonstrates the fragility of static, one-shot approaches—whereas even a single component error in a complex SQL string leads to total query failure, the modular stepwise approach permits fine-grained correction and robust recovery.
Agent interface design, featuring support for LLM-driven or scripted agents (e.g., a LangChain orchestrator with vLLM backend), reveals how models can iteratively parse observations, synthesize actions, assimilate feedback, and converge toward fully correct relational plans—showcasing practicality for both research and production development (Ananthakrishanan et al., 25 Sep 2025).

QueryGym represents a substantive advance in interactive database querying environments, distinguishing itself by its explicit, modular planning, engine-agnostic architecture, and direct support for reinforcement learning and interpretability research. Its structured protocol enables not only robust error remediation and transparency in query generation but also systematic experimentation on agentic planning in real-world relational data contexts.

PDF Markdown Chat (Pro)

References (1)

QueryGym: Step-by-Step Interaction with Relational Databases (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to QueryGym.