GenDB: LLM-Driven OLAP Query System
- GenDB is an LLM-driven OLAP system that synthesizes instance-optimized C++/Python code per query, adapting to specific hardware and data profiles.
- Its multi-agent architecture segments the process into workload analysis, storage design, query planning, code generation, execution, and iterative feedback.
- Empirical evaluations show GenDB achieves up to a 5× speedup over traditional engines, significantly lowering engineering costs for system extensibility.
GenDB is an LLM-driven system for OLAP query processing that eschews traditional monolithic engine design in favor of agentic, per-query code synthesis. By leveraging LLMs to synthesize instance-optimized C++ or Python code for each query, GenDB achieves both hardware- and data-aware execution strategies, dramatically reducing the engineering effort required for system extensibility while delivering performance exceeding that of state-of-the-art hand-tuned engines (Lao et al., 2 Mar 2026).
1. Motivation and Architectural Overview
Traditional query engines (e.g., PostgreSQL, DuckDB, MonetDB, ClickHouse, Umbra) are hand-engineered monoliths that demand considerable expertise for extension and hesitate to adapt to evolving hardware and data types. GenDB addresses three central limitations of this paradigm: (i) slow adaptation to heterogeneous data/hardware, (ii) rigidity in algebraic abstractions, and (iii) prohibitive engineering costs for new workloads.
GenDB replaces the fixed engine with a multi-agent LLM workflow that, for each incoming SQL query, orchestrates six steps: workload analysis, storage/index design, logical and physical planning, code synthesis, execution/feedback, and iterative optimization. This results in customized binaries optimized for the precise data characteristics (e.g., distributions, cardinalities), schema, workload patterns, and available compute resources. The canonical workflow is:
- Workload Analyzer: Profiles schema, statistics, hardware, SIMD capabilities.
- Storage/Index Designer: Synthesizes code for compression, indexing.
- Query Planner: Generates logical plans, enumerates and costs physical implementations.
- Code Generator: Prompts the LLM to emit optimal operator code as C++/Python.
- Executor & Feedback: Benchmarks correctness/performance against a ground-truth engine.
- Query Optimizer: Iteratively refines plans/code based on runtime feedback to meet SLAs/budgets.
2. Multi-Agent System and JSON Protocol
GenDB's architecture decomposes the LLM 'brain' into five specialized agents:
- Workload Analyzer Agent: Extracts workload characteristics, hardware profiling, data statistics.
- Storage/Index Designer Agent: Designs and emits code for physical data layout and indexes (e.g., zone maps, vector indexes).
- Query Planner Agent: Forms logical/physical plans, proposes operator choices, and selects the lowest estimated cost plan.
- Code Generator Agent: Translates plans into threaded, vectorized C++ or Python code, orchestrating compilation and patch application.
- Query Optimizer Agent: Consumes execution metrics, diagnoses hotspots, and issues code-level optimization directives.
All communication passes via structured, well-typed JSON objects to ensure lossless propagation of critical metadata (e.g., DECIMAL scaling), with explicit limits on iterations and LLM token budgets to bound resource consumption.
3. Per-Query Synthesis Pipeline
The GenDB per-query pipeline formalizes the transformation of SQL to instance-optimized executable code as follows:
$\begin{aligned} & \Function{GenerateCode}(Q, D, H): \ &\quad L \leftarrow \textsc{LogicalPlan}(Q) \ &\quad \mathcal{P} \leftarrow \textsc{EnumeratePhysicalPlans}(L, D, H) \ &\quad P^* \leftarrow \arg\min_{p\in \mathcal{P}}\;\textsc{EstimateCost}(p, D, H) \ &\quad C \leftarrow \textsc{LLM}(\text{prompt with }P^*, D, H) \ &\quad \textsc{Compile}(C) \ &\quad \Return C \end{aligned}$
Key steps:
- Logical Plan Generation: Canonicalizes SQL into relational algebra.
- Physical Plan Selection: Explores multiple implementations per operator; costs each with instance-level data/hardware information.
- Instance-Optimized Code Generation: LLM writes code tailored to , using explicit operator, data, and hardware annotations (e.g., group cardinality fits L1).
- Compilation and Execution: Code generator yields a binary executable.
This pipeline amortizes the (potentially expensive) code generation over repeated queries, which reflects the empirical observation that 80% of production analytic query submissions are exact repeats.
4. Instance-Level Optimization Techniques
GenDB's code synthesis enables optimization strategies unavailable to fixed-operator architectures:
- Data-Aware Column Encoding: Analyzes column distributions to select minimal data types (e.g.,
uint16_t), applies run-length/delta encoding, and generates on-the-fly decoding code. - Algorithm-Level Restructuring: Crafts bespoke algorithms such as direct array aggregation for low-cardinality GROUP BYs, or column-separated hash tables precisely dimensioned for L3 cache with explicit atomicity handling.
- Cache-Adaptive Aggregation: Selects aggregation strategy by fitting group-count entry-size into the smallest viable cache level. If no satisfies , resorts to per-thread tables and merging.
- Workload-Specific Structures: Synthesizes custom lookup tables, probe sequences, and vectorized loop patterns for specific join/filter characteristics.
The cost model is:
with all variables grounded in observed (data) and (hardware) at code generation time.
5. Empirical Evaluation and Comparative Results
GenDB's prototype employs a JavaScript orchestrator with Claude Sonnet 4.6 as the LLM back-end, generating C++ code compiled and executed on dual-socket Intel Xeon Gold 5218 nodes with 384 GB RAM (all data resident in-memory). Evaluations focus on:
- TPC-H (SF=10): Five representative queries (Q1, Q3, Q6, Q9, Q18)
- SEC-EDGAR: 5 GB corpus with six SQLSmith queries (designed for training-leakage resistance)
- Baselines: DuckDB (v1.4.4), Umbra (2026), MonetDB (11.55.1), ClickHouse (26.2.1), PostgreSQL (18.2) with both default and GenDB-recommended indexes.
Table 1: TPC-H (SF=10) End-to-End Hot-Run Latencies
| Query | GenDB | DuckDB | Umbra | ClickHouse | PostgreSQL |
|---|---|---|---|---|---|
| Q1 | 34 ms | 112 ms | 110 ms | 2080 ms | 1320 ms |
| Q3 | 58 ms | 180 ms | 172 ms | 2400 ms | 1650 ms |
| Q6 | 17 ms | 42 ms | 45 ms | 510 ms | 210 ms |
| Q9 | 38 ms | 235 ms | 232 ms | 800 ms | 480 ms |
| Q18 | 67 ms | 25 ms* | 31 ms* | 1200 ms | 600 ms |
| Total | 214 ms | 594 ms | 590 ms | 5990 ms | 4260 ms |
Queries marked “” use specialized zone-map pruning in baselines.
On the TPC-H subset, GenDB achieves a total latency of 214 ms, a 2.8× speedup over DuckDB. On SEC-EDGAR, GenDB achieves a 328 ms total latency versus 1640 ms (DuckDB), 1280 ms (Umbra), 5120 ms (ClickHouse), and 3950 ms (PostgreSQL) — demonstrating up to 5× speedup on unseen workloads.
Ablation studies confirm the advantage of the multi-agent architecture both in wall time (328 ms vs 1325 ms) and in LLM token economy (20–30% reduction), with the multi-agent version consistently outperforming single-agent guided and high-level variants.
6. Limitations and Prospective Enhancements
GenDB currently exhibits several limitations:
- LLM Inference Latency/Cost: Per-query code generation overhead is significant for short-lived, non-repeated queries.
- Correctness Guarantees: Relies on differential testing versus a reference DBMS; absence of formal semantic or functional guarantees.
- Silent Logical Failures: LLM output may omit critical code sections (e.g., resizing logic), causing hangs or silent misbehavior.
- Inter-Agent Semantic Drift: Misalignment or loss of structured metadata in agent JSON exchanges can induce subtle bugs.
Explored extensions include adaptive model selection (e.g., smaller LLMs for planning, larger for codegen), reusing synthesized operator templates across queries, incorporating external knowledge bases, integrating formal verification and symbolic execution steps, developing a prompt-tuned "experience" module for continual learning, and extending support for Python-based ML pipelines and GPU kernel synthesis via RAPIDS cuDF/RMM.
7. Significance and Outlook
GenDB operationalizes a paradigm shift in query execution: replacing monolithic engines engineered over decades with dynamically synthesized, LLM-generated code customized per query, per data instance, and per hardware profile. The result is a system capable of delivering state-of-the-art OLAP performance with a fraction of the software development burden, significantly improving both velocity of system extension and workload specialization. On both canonical benchmarks and real-world, data-leakage-resistant tasks, GenDB establishes new empirical baselines for query latency and extensibility, while surfacing novel research challenges around correctness, agent communication, and automated software synthesis (Lao et al., 2 Mar 2026).