GenDB: LLM-Driven OLAP Query System

Updated 7 March 2026

GenDB is an LLM-driven OLAP system that synthesizes instance-optimized C++/Python code per query, adapting to specific hardware and data profiles.
Its multi-agent architecture segments the process into workload analysis, storage design, query planning, code generation, execution, and iterative feedback.
Empirical evaluations show GenDB achieves up to a 5× speedup over traditional engines, significantly lowering engineering costs for system extensibility.

GenDB is an LLM-driven system for OLAP query processing that eschews traditional monolithic engine design in favor of agentic, per-query code synthesis. By leveraging LLMs to synthesize instance-optimized C++ or Python code for each query, GenDB achieves both hardware- and data-aware execution strategies, dramatically reducing the engineering effort required for system extensibility while delivering performance exceeding that of state-of-the-art hand-tuned engines (Lao et al., 2 Mar 2026).

1. Motivation and Architectural Overview

Traditional query engines (e.g., PostgreSQL, DuckDB, MonetDB, ClickHouse, Umbra) are hand-engineered monoliths that demand considerable expertise for extension and hesitate to adapt to evolving hardware and data types. GenDB addresses three central limitations of this paradigm: (i) slow adaptation to heterogeneous data/hardware, (ii) rigidity in algebraic abstractions, and (iii) prohibitive engineering costs for new workloads.

GenDB replaces the fixed engine with a multi-agent LLM workflow that, for each incoming SQL query, orchestrates six steps: workload analysis, storage/index design, logical and physical planning, code synthesis, execution/feedback, and iterative optimization. This results in customized binaries optimized for the precise data characteristics (e.g., distributions, cardinalities), schema, workload patterns, and available compute resources. The canonical workflow is:

Workload Analyzer: Profiles schema, statistics, hardware, SIMD capabilities.
Storage/Index Designer: Synthesizes code for compression, indexing.
Query Planner: Generates logical plans, enumerates and costs physical implementations.
Code Generator: Prompts the LLM to emit optimal operator code as C++/Python.
Executor & Feedback: Benchmarks correctness/performance against a ground-truth engine.
Query Optimizer: Iteratively refines plans/code based on runtime feedback to meet SLAs/budgets.

2. Multi-Agent System and JSON Protocol

GenDB's architecture decomposes the LLM 'brain' into five specialized agents:

Workload Analyzer Agent: Extracts workload characteristics, hardware profiling, data statistics.
Storage/Index Designer Agent: Designs and emits code for physical data layout and indexes (e.g., zone maps, vector indexes).
Query Planner Agent: Forms logical/physical plans, proposes operator choices, and selects the lowest estimated cost plan.
Code Generator Agent: Translates plans into threaded, vectorized C++ or Python code, orchestrating compilation and patch application.
Query Optimizer Agent: Consumes execution metrics, diagnoses hotspots, and issues code-level optimization directives.

All communication passes via structured, well-typed JSON objects to ensure lossless propagation of critical metadata (e.g., DECIMAL scaling), with explicit limits on iterations and LLM token budgets to bound resource consumption.

3. Per-Query Synthesis Pipeline

The GenDB per-query pipeline formalizes the transformation of SQL to instance-optimized executable code as follows:

$\begin{aligned} & \Function{GenerateCode}(Q, D, H): \ &\quad L \leftarrow \textsc{LogicalPlan}(Q) \ &\quad \mathcal{P} \leftarrow \textsc{EnumeratePhysicalPlans}(L, D, H) \ &\quad P^* \leftarrow \arg\min_{p\in \mathcal{P}}\;\textsc{EstimateCost}(p, D, H) \ &\quad C \leftarrow \textsc{LLM}(\text{prompt with }P^*, D, H) \ &\quad \textsc{Compile}(C) \ &\quad \Return C \end{aligned}$

Key steps:

Logical Plan Generation: Canonicalizes SQL into relational algebra.
Physical Plan Selection: Explores multiple implementations per operator; costs each with instance-level data/hardware information.
Instance-Optimized Code Generation: LLM writes code tailored to $P^*$ , using explicit operator, data, and hardware annotations (e.g., group cardinality fits L1).
Compilation and Execution: Code generator yields a binary executable.

This pipeline amortizes the (potentially expensive) code generation over repeated queries, which reflects the empirical observation that 80% of production analytic query submissions are exact repeats.

4. Instance-Level Optimization Techniques

GenDB's code synthesis enables optimization strategies unavailable to fixed-operator architectures:

Data-Aware Column Encoding: Analyzes column distributions to select minimal data types (e.g., uint16_t), applies run-length/delta encoding, and generates on-the-fly decoding code.
Algorithm-Level Restructuring: Crafts bespoke algorithms such as direct array aggregation for low-cardinality GROUP BYs, or column-separated hash tables precisely dimensioned for L3 cache with explicit atomicity handling.
Cache-Adaptive Aggregation: Selects aggregation strategy by fitting group-count $\times$ entry-size into the smallest viable cache level. If no $\ell$ satisfies $g \times \text{entry\_size} \leq S_\ell$ , resorts to per-thread tables and merging.
Workload-Specific Structures: Synthesizes custom lookup tables, probe sequences, and vectorized loop patterns for specific join/filter characteristics.

The cost model is:

$\mathrm{Cost}(p) = \sum_{o \in p} \left( \alpha_o \cdot \mathrm{MemAccess}(o, D) + \beta_o \cdot \mathrm{CPU}(o, D) + \gamma_o \cdot \mathrm{SyncOverhead}(o, H) \right)$

with all variables grounded in observed $D$ (data) and $H$ (hardware) at code generation time.

5. Empirical Evaluation and Comparative Results

GenDB's prototype employs a JavaScript orchestrator with Claude Sonnet 4.6 as the LLM back-end, generating C++ code compiled and executed on dual-socket Intel Xeon Gold 5218 nodes with 384 GB RAM (all data resident in-memory). Evaluations focus on:

TPC-H (SF=10): Five representative queries (Q1, Q3, Q6, Q9, Q18)
SEC-EDGAR: 5 GB corpus with six SQLSmith queries (designed for training-leakage resistance)
Baselines: DuckDB (v1.4.4), Umbra (2026), MonetDB (11.55.1), ClickHouse (26.2.1), PostgreSQL (18.2) with both default and GenDB-recommended indexes.

Table 1: TPC-H (SF=10) End-to-End Hot-Run Latencies

Query	GenDB	DuckDB	Umbra	ClickHouse	PostgreSQL
Q1	34 ms	112 ms	110 ms	2080 ms	1320 ms
Q3	58 ms	180 ms	172 ms	2400 ms	1650 ms
Q6	17 ms	42 ms	45 ms	510 ms	210 ms
Q9	38 ms	235 ms	232 ms	800 ms	480 ms
Q18	67 ms	25 ms*	31 ms*	1200 ms	600 ms
Total	214 ms	594 ms	590 ms	5990 ms	4260 ms

Queries marked “” use specialized zone-map pruning in baselines.

On the TPC-H subset, GenDB achieves a total latency of 214 ms, a 2.8× speedup over DuckDB. On SEC-EDGAR, GenDB achieves a 328 ms total latency versus 1640 ms (DuckDB), 1280 ms (Umbra), 5120 ms (ClickHouse), and 3950 ms (PostgreSQL) — demonstrating up to 5× speedup on unseen workloads.

Ablation studies confirm the advantage of the multi-agent architecture both in wall time (328 ms vs 1325 ms) and in LLM token economy (20–30% reduction), with the multi-agent version consistently outperforming single-agent guided and high-level variants.

6. Limitations and Prospective Enhancements

GenDB currently exhibits several limitations:

LLM Inference Latency/Cost: Per-query code generation overhead is significant for short-lived, non-repeated queries.
Correctness Guarantees: Relies on differential testing versus a reference DBMS; absence of formal semantic or functional guarantees.
Silent Logical Failures: LLM output may omit critical code sections (e.g., resizing logic), causing hangs or silent misbehavior.
Inter-Agent Semantic Drift: Misalignment or loss of structured metadata in agent JSON exchanges can induce subtle bugs.

Explored extensions include adaptive model selection (e.g., smaller LLMs for planning, larger for codegen), reusing synthesized operator templates across queries, incorporating external knowledge bases, integrating formal verification and symbolic execution steps, developing a prompt-tuned "experience" module for continual learning, and extending support for Python-based ML pipelines and GPU kernel synthesis via RAPIDS cuDF/RMM.

7. Significance and Outlook

GenDB operationalizes a paradigm shift in query execution: replacing monolithic engines engineered over decades with dynamically synthesized, LLM-generated code customized per query, per data instance, and per hardware profile. The result is a system capable of delivering state-of-the-art OLAP performance with a fraction of the software development burden, significantly improving both velocity of system extension and workload specialization. On both canonical benchmarks and real-world, data-leakage-resistant tasks, GenDB establishes new empirical baselines for query latency and extensibility, while surfacing novel research challenges around correctness, agent communication, and automated software synthesis (Lao et al., 2 Mar 2026).

Markdown Report Issue Upgrade to Chat

References (1)

GenDB: The Next Generation of Query Processing -- Synthesized, Not Engineered (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GenDB.

GenDB: LLM-Driven OLAP Query System

1. Motivation and Architectural Overview

2. Multi-Agent System and JSON Protocol

3. Per-Query Synthesis Pipeline

4. Instance-Level Optimization Techniques

5. Empirical Evaluation and Comparative Results

Table 1: TPC-H (SF=10) End-to-End Hot-Run Latencies

6. Limitations and Prospective Enhancements

7. Significance and Outlook

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

GenDB: LLM-Driven OLAP Query System

1. Motivation and Architectural Overview

2. Multi-Agent System and JSON Protocol

3. Per-Query Synthesis Pipeline

4. Instance-Level Optimization Techniques

5. Empirical Evaluation and Comparative Results

Table 1: TPC-H (SF=10) End-to-End Hot-Run Latencies

6. Limitations and Prospective Enhancements

7. Significance and Outlook

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research