GenDB: The Next Generation of Query Processing -- Synthesized, Not Engineered

Published 2 Mar 2026 in cs.DB, cs.AI, cs.CL, cs.LG, and cs.MA | (2603.02081v1)

Abstract: Traditional query processing relies on engines that are carefully optimized and engineered by many experts. However, new techniques and user requirements evolve rapidly, and existing systems often cannot keep pace. At the same time, these systems are difficult to extend due to their internal complexity, and developing new systems requires substantial engineering effort and cost. In this paper, we argue that recent advances in LLMs are starting to shape the next generation of query processing systems. We propose using LLMs to synthesize execution code for each incoming query, instead of continuously building, extending, and maintaining complex query processing engines. As a proof of concept, we present GenDB, an LLM-powered agentic system that generates instance-optimized and customized query execution code tailored to specific data, workloads, and hardware resources. We implemented an early prototype of GenDB that uses Claude Code Agent as the underlying component in the multi-agent system, and we evaluate it on OLAP workloads. We use queries from the well-known TPC-H benchmark and also construct a new benchmark designed to reduce potential data leakage from LLM training data. We compare GenDB with state-of-the-art query engines, including DuckDB, Umbra, MonetDB, ClickHouse, and PostgreSQL. GenDB achieves significantly better performance than these systems. Finally, we discuss the current limitations of GenDB and outline future extensions and related research challenges.

Abstract PDF Upgrade to Chat

Summary

The paper demonstrates that LLM-powered, multi-agent query processing significantly outperforms conventional, hand-engineered database systems.
It employs dynamic code generation and hardware-aware optimizations, achieving up to an 11.2× speedup on diverse benchmarks.
Iterative optimization with structured, agentic pipelines offers enhanced adaptability and cost efficiency for complex, evolving workloads.

GenDB: LLM-Driven Instance-Optimized Query Processing

Motivation and Context

GenDB proposes a paradigm shift in query processing, moving from monolithic, manually engineered database engines to per-query, instance-optimized code generation powered by LLMs. The traditional approach, reliant on decades of expert-driven system design, faces critical challenges: rapid evolution of user requirements, difficulty in system extensibility, and prohibitive engineering costs for adapting to emerging workloads or hardware architectures. GenDB posits that recent advances in LLMs—especially agentic architectures capable of multi-stage reasoning and tool invocation—enable a new level of optimization and extensibility previously unattainable by template-based or compiler-driven query code generation.

System Architecture

GenDB employs an agentic LLM pipeline, decomposing query processing into hierarchical, specialized agents. Input includes schema, queries (SQL or natural language), data, optimization targets, and hardware resource descriptions. The pipeline sequentially executes:

Workload Analysis: Extraction of data distribution, join patterns, selectivities, and hardware properties (e.g., cache topology, SIMD support).
Storage/Index Design: Generation of optimal storage layouts (e.g., columnar with compression, encoding) and instance-tailored index structures.
Query Planning: Resource-aware operator selection and execution plan construction—including cache-adaptive aggregation, join order enumeration, and selective algorithmic restructuring.
Code Generation: Synthesis of executable query code in target languages (C++ for compute, Python for AI modules), augmented by system-level optimizations (compiler flags, OS features).
Iterative Optimization: Runtime feedback-driven refinement of plan and code, supporting user-constrained objectives and early stopping criteria.

Each agent leverages LLM core reasoning for planning, tool invocation, structured inter-agent communication (JSON/Markdown), and deterministic constraint enforcement.

Figure 1: GenDB system architecture outlining agent stages for workload analysis, storage design, query planning, code generation, and optimizer-driven refinement.

Experimental Evaluation

GenDB is evaluated on OLAP workloads using TPC-H (SF=10) and a SEC-EDGAR benchmark constructed to circumvent potential LLM training data leakage. Comparative baselines include DuckDB, Umbra, MonetDB, ClickHouse, and PostgreSQL. Execution environments maintain hardware parity (dual Xeon CPUs, 384GB RAM).

GenDB decisively outperforms all baselines on representative queries (Q1, Q3, Q6, Q9, Q18 for TPC-H; six diversified SEC-EDGAR queries). On TPC-H, GenDB achieves a cumulative 214 ms for five representative queries, yielding a 2.8 $\times$ speedup over DuckDB/Umbra and 11.2 $\times$ over ClickHouse. On SEC-EDGAR, GenDB is 5 $\times$ faster than DuckDB and 3.9 $\times$ faster than Umbra. The gap scales with query complexity (e.g., TPC-H Q9, a five-way join: 6.1 $\times$ acceleration over DuckDB).

Iterative optimization demonstrates substantial reductions: TPC-H Q18 starts at 12,147 ms and converges to 74 ms by iteration 1 (163 $\times$ improvement); SEC-EDGAR Q4 reduces from 1,410 ms to 106 ms over three iterations. Optimization is driven by adaptation such as index-aware scans, cache-conscious accumulation, and operator fusion.

Figure 2: Query execution times for GenDB versus baseline engines across TPC-H and SEC-EDGAR; GenDB achieves consistent sublinear execution time and superior scalability with query complexity.

Ablation Study

The ablation study isolates the impact of pipeline structure and domain-specific hints. Replacing the full multi-agent pipeline with a single agent using hints (guided) or wholly autonomous (high-level) leads to measurable degradations: the multi-agent pipeline provides a 1.2 $\times$ speedup on TPC-H and up to 4.1 $\times$ on SEC-EDGAR, with lower monetary cost for LLM inference. These results underscore the necessity for structured, hierarchical decomposition and deterministic constraint handling in robust agentic systems—especially for queries and schemas outside the scope of LLM memorization. Removal of domain hints further impairs generalization.

Figure 3: Ablation results showing performance and cost differences between multi-agent, guided single-agent, and high-level agent variants; structured and specialized agents yield fastest and cheapest query execution.

Design Implications

Optimization Granularity

GenDB achieves high performance via fine-grained, workload-specific, hardware-aware optimization. Typical engine abstractions (fixed operator sets, unified aggregation logic) are superseded by explicit adaptation: e.g., direct array aggregation for low cardinality (fits in L1), column-separated shared hash tables for high cardinality leveraging lock-free atomic writes, and layout selection aligned with cache line sizing. Such adaptation requires pre-runtime analysis and tailored code generation, unachievable in static operator logic.

Extensibility and Integration

GenDB naturally extends to multimodal query processing (e.g., semantic queries over text/audio/image), leveraging LLMs/ML inference modules in generated code. This supports emerging workloads favoring inference cost optimization over classic relational operator acceleration. The modular pipeline facilitates dynamic model selection, prompt tuning, and code synthesis for hybrid analytical workloads.

For hardware acceleration (GPU, SIMD, multi-core), GenDB can utilize high-performance libraries (e.g., libcudf, RMM) and architect GPU-native pipelines unconstrained by legacy CPU-centric operator abstractions.

Robustness and Communication

Failures are handled via deterministic timeouts, structured communication protocols, and rollback to best prior candidate implementations. Silent LLM/agent failures (e.g., exceeded token limits, incorrect hash table construction) are mitigated but systematic detection/reporting needs refinement. Communication protocols (JSON, TOON, MCP) are evaluated for scaling inter-agent messaging as schema and query complexity increases.

Future Directions

Key research vectors include:

Adaptive cost reduction: Dynamic model selection by pipeline stage complexity and query difficulty; generation of reusable operator templates and multi-query optimization.
Robustness improvements: Integration of domain knowledge bases, detection and mitigation of agent/LLM silent failures, scalable structured inter-agent protocols, and formal verification of generated code.
Self-evolving architectures: Persistent experience modules to accumulate and propagate effective optimization patterns and identified failure cases; autonomous prompt refinement and agent pipeline redesign for continual quality improvement.
Semantic query processing: Generation of code for extended data models, direct deployment of ML/LLM inference, prompt and model optimization, and specialized code for modern hardware.

Conclusion

GenDB demonstrates that LLM-powered agentic query code generation can substantially exceed existing hand-engineered systems in OLAP workload performance and extensibility. By leveraging multi-agent reasoning, hardware-aware optimizations, and iterative feedback, GenDB achieves robust, scalable, and highly customizable query processing. The results validate a transition from static, engineered engines to dynamic, synthesized pipelines, with implications for both the architectural future of database systems and broader AI-driven automation in systems research.

Markdown Report Issue

Paper to Video (Beta)

All Videos Subscribe on YouTube

Whiteboard

GenDB: The Next Generation of Query Processing -- Synthesized, Not Engineered

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Explain it Like I'm 14

What is this paper about?

This paper introduces GenDB, a new way to run database queries. Instead of relying on one large, hand‑built database engine, GenDB uses an AI (a LLM, or LLM) to write a tiny, custom program for each query. That program is tailored to the exact data, the kind of questions people ask, and the computer it runs on. The goal is to make queries run much faster and to make it easier to add new features without rebuilding an entire system.

Think of it like this: instead of using a Swiss‑army knife for every job, GenDB 3D‑prints the perfect tool for the task right when you need it.

What questions were the researchers asking?

They focused on a few simple questions:

Can an AI automatically write fast, correct code to answer database questions?
Is this “custom code per query” approach faster than today’s best database systems?
How should a system like this be organized so it can analyze data, plan, write, test, and improve code automatically?
Where does this approach shine, and what are its limits?

How did they do it?

GenDB works like a team of specialized AI assistants (agents) that cooperate:

Understand the situation It reads the database layout (like a very big, organized spreadsheet), the queries (in SQL or plain English), and the computer’s hardware. It summarizes key facts, like which columns get filtered, how big the tables are, and what the machine can do (for example, does it have fast special instructions or a big cache).
Design the storage and indexes Storage is how data is physically laid out; an index is like a book’s index that helps you jump straight to what you need. GenDB can reorganize data (for example, column‑by‑column for analytics) and build helpful indexes automatically.
Plan the query It picks the best way to answer the question—what to filter first, how to join tables, and how to group and aggregate results—based on the earlier analysis.
Generate code It writes executable code (often in C++ for speed) that implements the plan and uses tricks the computer is good at. Then it runs the code and checks the results by comparing them to a traditional database system to make sure the answers match.
Improve with feedback It times the code, spots slow parts, and rewrites them—iterating until it meets the user’s goals or runs out of budget (like a limit on how many attempts it can make).

They tested GenDB on two analytics workloads (these are read‑heavy tasks that summarize large amounts of data, often used for reports):

TPC‑H: a well‑known benchmark used by many researchers.
SEC‑EDGAR: a new, real‑world financial dataset they built to avoid the chance the AI had seen similar queries during training.

GenDB’s results were compared to top database systems: DuckDB, Umbra, MonetDB, ClickHouse, and PostgreSQL. All tests ran on the same server, and they measured “hot runs” (re‑runs after the data is already in memory) to focus on computation speed.

Quick glossary:

LLM: an advanced AI that can understand and generate text and code.
OLAP (analytics): reading and summarizing large data for insights, rather than updating lots of small records.
Index: a shortcut to find relevant rows quickly.
Hot run: measuring speed after the data is already cached in memory.

What did they find, and why is it important?

Main results:

GenDB was significantly faster than all comparison systems on every query they tested.
- On a subset of TPC‑H queries, GenDB ran about 2.8× faster than the fastest traditional systems.
- On the SEC‑EDGAR tasks, it was about 5× faster than DuckDB and about 3.9× faster than Umbra.
The more complex the query, the bigger the speed‑ups. In some cases, after a few rounds of improvement, GenDB reduced a query from many seconds to a tiny fraction of a second.
A “multi‑agent” design (specialists for analysis, storage, planning, code, and optimization) worked better and cost less than using a single, general‑purpose AI agent.

Why it works:

Tailor‑made code avoids general‑purpose “one size fits all” overheads.
It picks strategies that match the actual data and the computer hardware (for example, choosing different methods if there are only a few groups vs. millions, so results fit better in fast CPU caches).
It can build small, custom data structures just for the queries you actually run.

Why it matters:

Faster analytics means quicker answers to business questions.
New techniques (like working with images or using GPUs) can be adopted without redesigning a big system.
For companies that repeat the same queries often, the one‑time cost of generating code can be shared over many runs, making it very efficient.

What does this mean for the future?

Potential impact:

How databases are built may change: instead of spending years engineering a single huge engine, AI could generate the right code at the right time.
It could make it easier to support new data types (like images or audio) and to use specialized hardware (like GPUs) where it helps most.
A hybrid setup makes sense: a traditional database for one‑off or ad‑hoc queries, and GenDB for frequent or heavy queries that deserve custom code.

Current limits and next steps:

Correctness checks: Today, GenDB compares its answers to a known database. If that “ground truth” isn’t available, a human expert should review the code. There are no formal guarantees for answers generated from plain‑English questions without SQL.
Cost and speed of code generation: They plan to reduce cost by using smaller AIs for simpler steps, reusing pieces of code across queries, and trimming the AI’s inputs to cut “token” usage.
Reliability: They want to handle cases where AI‑generated code gets stuck or is inefficient, improve how the AI agents talk to each other, and add stronger verification.
Learning over time: The system could keep an “experience” memory of what worked and what failed to improve future generations automatically.

In one sentence

GenDB shows that having an AI write a tiny, custom program for each database query can make analytics much faster and more flexible, hinting at a future where databases are synthesized on demand instead of painstakingly engineered by hand.

View Paper Prompt View All Prompts

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Below is a single consolidated list of unresolved issues that are either missing, uncertain, or left unexplored in the paper; each item is phrased to be actionable for future research.

Lack of formal correctness guarantees for generated code beyond result comparison to a reference DBMS; investigate automated test generation, property-based/metamorphic testing, and formal verification for common query templates.
No strategy to guarantee SQL standard compliance across edge cases (NULLs, collations, outer joins, window functions, time zones, numeric overflow/underflow, complex types); evaluate coverage and correctness across the full SQL surface.
Correctness for natural-language-driven queries is explicitly unspecified; design guardrails and validation techniques for NL-to-code paths to bound semantic drift and hallucination.
Reliance on a ground-truth engine (e.g., DuckDB) for correctness checking limits deployment where no trusted reference exists; develop reference-free validation methods (e.g., equivalence rewriting, randomized differential testing).
Floating-point determinism and parallel reduction non-associativity are not addressed; define acceptable numerical tolerances and reproducible reduction strategies for consistent results.
Only hot-run latencies on in-memory datasets (TPC-H SF=10 and SEC-EDGAR ~5 GB) are reported; evaluate cold-cache performance, disk I/O-bound scenarios, and larger-than-memory datasets.
End-to-end amortized performance is not quantified; measure total wall-clock time including storage/index build, code synthesis/compilation, and iterative optimization to establish break-even points versus traditional engines as a function of query repetition.
Concurrency and throughput are untested; assess performance under multi-tenant workloads, simultaneous queries, and cross-query interference on shared caches and memory bandwidth.
Update-heavy or mixed OLTP/OLAP workloads are out of scope; design and evaluate incremental maintenance of generated storage/index structures and incremental regeneration of code when data or schema change.
Distributed or multi-node execution is not considered; explore code synthesis for distributed joins/aggregations, data partitioning strategies, and coordination protocols.
Generality across diverse schemas and complex join graphs is unproven; systematically test on varied join topologies, high skew, heavy LIKE patterns, and extreme cardinalities beyond TPC-H/SEC-EDGAR subsets.
Portability and robustness of hardware-specialized code are unclear; study how generated code adapts to different CPU microarchitectures, NUMA topologies, and varying cache hierarchies without manual retuning.
GPU/accelerator support is proposed but not implemented; quantify benefits and trade-offs of GPU-native operator generation (e.g., libcudf, RMM), including data transfer orchestration and hybrid CPU–GPU execution.
Multi-objective or constrained optimization (e.g., latency under memory caps or cost budgets) is not realized; implement optimization algorithms that handle SLAs and cost–performance trade-offs with formal objective definitions.
The optimizer’s search procedure is not specified (beyond iterative refinement and early stopping); formalize the optimization loop (search space, acquisition functions, stopping rules) and report convergence properties.
Silent failure modes (non-terminating probes, load-factor blowups, LLM output truncation) are only mitigated with timeouts; develop static analyses, runtime watchdogs, and synthesized “safe” data structures (e.g., capped-load-factor hash tables) and prove liveness properties.
Generated C++ raises memory-safety and concurrency risks; evaluate safe-by-construction approaches (e.g., Rust, sanitizers, formal contracts), sandboxing (e.g., seccomp, WASM), and automated static/dynamic analysis gates before execution.
Inter-agent misalignment (e.g., DECIMAL scale handling) reveals missing type-checked IR; introduce a strongly-typed intermediate representation and schema-aware codegen that enforces type/scale rules across agents.
Inter-agent communication is JSON/template-based but not enforced; assess typed protocols (e.g., MCP, A2A) with schema validation, budget/constraint enforcement across dynamically spawned sub-agents, and failure-recovery policies.
External domain knowledge is not integrated; design a curated, verifiable knowledge base and retrieval pipeline (with source attribution and trust scoring) to augment parametric knowledge without introducing noise.
Security and privacy of LLM-in-the-loop synthesis are unaddressed; define data minimization policies, redaction for schema/data prompts, network egress controls, and code provenance scanning when agents use web tools.
Legal/licensing implications of synthesized code are not discussed; add license compliance checks and provenance tracking to ensure outputs don’t inadvertently replicate restricted training data.
Reproducibility is fragile given LLM non-determinism and model drift (e.g., Claude Sonnet 4.6); provide deterministic seeds, prompt/version pinning, cached intermediate artifacts, and replayable execution traces.
Cost accounting is partial (ablation includes $ costs but main results do not); standardize reporting of LLM/API costs, token usage, compilation time, and runtime to compare total cost of ownership with baselines.
Adaptive model selection policies are unimplemented; design and evaluate controllers that route steps/queries to small vs. large models based on estimated complexity and target confidence.
Token reduction techniques (compression of schema/queries/messages) are only suggested; measure their accuracy impact and define thresholds where compression degrades code quality.
Reuse is minimal (only utility functions); build libraries of parameterized, validated operator templates (e.g., cache-adaptive aggregations) and assess reuse benefits across query sets and schemas.
Lifecycle management of per-query executables is unspecified; design caching, versioning, and eviction policies for generated binaries and derived structures, with dependency tracking on schema/data changes.
Hybrid deployment with a traditional DBMS is only conceptual; implement a router that selects between GenDB and a baseline DBMS based on estimated amortization, correctness confidence, and SLAs, with failover paths.
Benchmark fairness and completeness can be improved; include full TPC-H and other benchmarks (e.g., TPC-DS), measure index/materialized view build times, and compare against fully tuned baselines, not just automatic defaults.
Storage/index build time, memory footprint, and space amplification of generated structures (e.g., nibble-packed lookups, zone maps) are not reported; quantify build cost and runtime overheads under varying data distributions.
Stability of performance across runs and sensitivity to LLM stochasticity are not analyzed; report variance, worst-case slowdowns, and mechanisms to detect/regress anomalous outputs.
Governance for code updates (CI/CD of generated code, audit logs, rollback) is not addressed; define deployment pipelines and operational safety checks for production use.
Extension to semantic/multimodal queries is future work; specify how to jointly optimize runtime cost and quality metrics (e.g., accuracy, recall) for model-in-the-loop operators, with reproducible quality evaluations.

View Paper Prompt View All Prompts

Practical Applications

Immediate Applications

Below are concrete, deployable use cases that organizations can implement with the paper’s methods today. Each item includes sectors, potential tools/products/workflows, and key assumptions or dependencies.

Accelerate recurring analytics queries in data warehouses and lakehouses
- Sectors: Software, Retail, Finance, Media, Telecom
- What: Identify high-frequency SQL (scheduled BI dashboards, ELT jobs, monthly closes), synthesize per-query C++ executables and tailored storage/indexes, then cache and reuse them; route ad-hoc queries to a standard DBMS
- Tools/products/workflows: “Query Accelerator” sidecar for Snowflake/BigQuery/Databricks-lakehouse via external tables/Parquet; Airflow/dbt node that compiles and dispatches executables; per-query binary cache + validation against a reference engine (e.g., DuckDB)
- Assumptions/dependencies: Significant query repetition; stable schemas; build toolchain and sandboxing for native code; governance approval for hybrid architecture; correctness checks against a ground-truth system or manual review for critical workloads
Cloud cost and energy reduction for analytics
- Sectors: Cloud, Finance, Energy/Sustainability
- What: Replace long-running/expensive OLAP queries with synthesized executables to cut runtime and instance-hours; quantify carbon reduction via fewer CPU-hours
- Tools/products/workflows: “QCaaS” (Query-Compiler-as-a-Service) with cost dashboards; FinOps integration to recommend candidates based on spend; carbon accounting hooks
- Assumptions/dependencies: Reliable cost/telemetry baselines; organizational tolerance for per-query binaries; fallback to baseline DBMS in case of failures
Automated storage layout and index design for static and semi-static datasets
- Sectors: Software, Media, Public Data Portals
- What: Generate columnar encodings, compression, zone maps/dictionary indexes customized to value distributions and hardware cache hierarchy
- Tools/products/workflows: “Storage Designer” that outputs Parquet/Arrow-with-encodings plus build scripts for auxiliary indexes; one-time migration step in data lakes
- Assumptions/dependencies: Read-heavy workloads; ability to transform data formats; stable data distribution statistics
Hybrid DBMS deployment for mixed ad-hoc and repeated workloads
- Sectors: Enterprise IT, BI/Analytics
- What: A router that sends one-off queries to the existing engine and frequent queries to synthesized code; automatic identification of repeating patterns
- Tools/products/workflows: Query router + “frequent-query detector”; CI/CD to promote synthesized code from staging to production after validation
- Assumptions/dependencies: Query logging and fingerprinting; operational guardrails (timeouts, rollback); integration with access control/auditing
Developer productivity and performance engineering in academia and industry
- Sectors: Academia, Software/SRE/DevOps
- What: Rapid prototyping of operator implementations and join orders; auto-generation of realistic microbenchmarks for cache/NUMA stress testing
- Tools/products/workflows: Reproducible harness using the paper’s multi-agent pipeline; structured JSON for inter-agent plans; automated join-order sampling and timing
- Assumptions/dependencies: Access to representative data; hardware performance counters or profilers; safe execution environment for generated code
BI for non-SQL users via a natural-language interface (with guardrails)
- Sectors: BI/Analytics, Education, SMBs
- What: Add NL-to-SQL with synthesized execution for common dashboards; clearly communicate that NL mode lacks formal correctness guarantees; enforce output validation against small samples or a reference engine where feasible
- Tools/products/workflows: Conversational analytics UI; validation hooks (sample-based, golden queries); opt-in approval for new NL-generated pipelines
- Assumptions/dependencies: Risk management acceptance of weaker guarantees; strong observability and human-in-the-loop approval for promoted jobs
On-prem/air-gapped analytics in regulated environments
- Sectors: Healthcare, Government, Defense
- What: Use local LLMs or offline prompts to synthesize code and avoid sending sensitive data to external APIs; validate against an on-prem reference DBMS
- Tools/products/workflows: Containerized codegen toolchain; offline model serving; secured build/exec pipeline with SBOM
- Assumptions/dependencies: Adequate local model capability; security reviews for native code execution; performance competitive with baseline systems
Edge and gateway analytics for simple aggregations and filtering
- Sectors: IoT, Manufacturing, Logistics
- What: Compile small, fixed analytics workloads (e.g., rolling counts, QA metrics) into native binaries for gateways/smart devices to reduce backhaul and latency
- Tools/products/workflows: Cross-compilation pipeline (e.g., ARM/x86); minimal runtime footprint; periodic refresh when data characteristics drift
- Assumptions/dependencies: Stable, simple queries; limited memory/CPU; secure update channels; careful correctness validation without a heavy DBMS

Long-Term Applications

These applications require further research, scaling, or maturation of the approach (robustness, verification, cost control, interoperability).

Synthesized, zero-knob analytical databases for domain-specific workloads
- Sectors: Software, Enterprise IT
- What: Replace parts of general engines with per-workload codegen pipelines that handle storage, execution, and optimization end-to-end
- Tools/products/workflows: “Self-assembling DB” that emits a binary per query/template plus shared components; multi-query optimization to co-generate shared operators
- Assumptions/dependencies: Strong correctness guarantees; robust failure handling; concurrency/resource isolation; efficient regeneration when schemas evolve
Unified semantic/multimodal query processing (text, images, audio) alongside relational data
- Sectors: E-commerce (search/reco), Media, Healthcare (imaging), Security
- What: Generate pipelines that call ML/LLM models for semantic predicates (e.g., similarity search, caption filters) with cost-quality trade-offs
- Tools/products/workflows: Python bridges to model inference, prompt/model selection, RAG; caching of embeddings; hybrid operators that mix symbolic and neural logic
- Assumptions/dependencies: Model inference dominates cost; evaluation metrics for “semantic correctness”; privacy/compliance for model use
GPU-native code synthesis for analytics kernels
- Sectors: Software/Analytics, HPC
- What: Autogenerate CUDA/C++ pipelines that leverage libcudf/RMM and kernel fusion to outperform CPU-based paths on large scans and joins
- Tools/products/workflows: GPU-aware planner; PCIe/NVLink-aware data movement strategies; autotuning of block/grid sizes
- Assumptions/dependencies: GPU availability and ROI; data locality; specialized debugging and profiling support
SLA-aware, multi-objective optimizers balancing performance, cost, and memory
- Sectors: Cloud Platforms, Managed Data Services
- What: Optimizers that navigate latency SLOs, budget constraints, and memory/disk limits while generating query code
- Tools/products/workflows: Cost models + online telemetry; constrained or multi-objective search; adaptive model selection per query and per pipeline stage
- Assumptions/dependencies: Accurate telemetry; robust, low-cost model orchestration; guardrails for overfitting to transient conditions
Self-evolving experience stores for cross-query and cross-tenant learning
- Sectors: SaaS Data Platforms, Enterprise IT
- What: Persist and reuse successful patterns (e.g., cache-adaptive aggregation) and anti-patterns; share safely across tenants/domains
- Tools/products/workflows: Experience memory with safety filters; retrieval-augmented prompting for codegen; federated learning across organizations
- Assumptions/dependencies: Privacy isolation; governance for knowledge promotion/demotion; evaluation harnesses to validate reuse
Formal verification and safety certification for generated query code
- Sectors: Healthcare, Finance, Public Sector (procurement)
- What: Verified-by-construction templates for common query patterns; proof artifacts (e.g., proof-carrying code) for compliance audits
- Tools/products/workflows: Restricted DSLs for safety-critical patterns; SMT-based checks; CI gates that require proofs before deployment
- Assumptions/dependencies: Performance overheads acceptable; limits on expressiveness for verifiable subsets; specialized verification expertise
Interoperable agent protocols for data systems (MCP/TOON/A2A) and auditability
- Sectors: Software/Standards, Policy
- What: Standardize structured inter-agent communication and tool-use to reduce misalignment and improve reproducibility; comprehensive audit logs
- Tools/products/workflows: Protocol-compliant agents; message schema registries; signed artifacts and SBOM/SLSA compliance for generated code
- Assumptions/dependencies: Community/vendor adoption; governance for protocol evolution; secure key management
Leakage-resistant benchmarking and procurement practices
- Sectors: Academia, Public Sector, Enterprise Procurement
- What: Suite of “unseen” benchmarks (like SEC-EDGAR) to evaluate LLM-driven systems fairly; guidance for RFPs that test robustness beyond memorized workloads
- Tools/products/workflows: Curated datasets with freshness guarantees; benchmark generators; reporting standards for LLM involvement and prompt budgets
- Assumptions/dependencies: Licensing for datasets; sustained curation; community consensus on fairness criteria
Serverless “SQL-as-functions” with precompiled, per-query binaries
- Sectors: Cloud, Edge/Mobile, SaaS
- What: Deploy compiled query functions as low-latency serverless endpoints; cache warm binaries for popular analytics
- Tools/products/workflows: Binary artifact registry; cold-start mitigation (provisioned concurrency); per-tenant isolation and sandboxing
- Assumptions/dependencies: Secure native execution in FaaS; packaging constraints; strong observability and throttling
Carbon-aware query planning and execution via specialization
- Sectors: Energy, Sustainability/ESG, Cloud
- What: Schedule and specialize heavy queries to minimize energy/carbon (e.g., off-peak, green regions) while preserving SLAs
- Tools/products/workflows: Carbon-intensity signals; variant selection (CPU/GPU, kernel fusion, encoding choice) guided by carbon/performance curves
- Assumptions/dependencies: Accurate carbon telemetry; policy/SLAs allowing flexible timing/placement; mature multi-objective optimizers
Automated data-governance hooks for generated analytics code
- Sectors: Compliance, Enterprise IT
- What: Embed lineage, PII-access checks, and policy enforcement in generated code; generate auditable descriptions of operator choices
- Tools/products/workflows: Policy-as-code injection; lineage capture at operator boundaries; automated documentation of codegen decisions
- Assumptions/dependencies: Up-to-date data catalogs/classifiers; policy engines (e.g., OPA) integration; performance overhead acceptance

These applications leverage the paper’s core innovations—per-query code synthesis, workload/hardware-aware planning, iterative runtime feedback, and agentic decomposition—and translate them into deployable products and roadmaps. Feasibility depends most on the presence of repeat queries, robust correctness validation, secure native execution environments, and clear governance around LLM-assisted generation.

View Paper Prompt View All Prompts

Glossary

ablation study: A controlled analysis in which components of a system are systematically removed or altered to assess their impact. Example: "an ablation study of GenDB"
Agent Skills: Predefined tool-use capabilities or modules that can be plugged into agents to extend their functionality. Example: "integrating Agent Skills"
agentic system: A system composed of autonomous LLM-driven agents that plan, reason, and use tools to complete complex tasks. Example: "GenDB is an LLM-powered agentic system"
aggregation operator: A database operator that computes summaries over groups of rows (e.g., sums, counts) during query execution. Example: "the aggregation operator adapts its strategy to the hardware cache hierarchy"
batched-prefetch: A technique that issues memory prefetches in batches to hide latency during repeated probe or scan patterns. Example: "batched-prefetch probe patterns"
branchless scan: A scan implementation that avoids conditional branches to improve pipeline efficiency and reduce misprediction costs. Example: "a branchless scan"
cache-adaptive aggregation: Choosing aggregation strategies based on cache sizes and group counts to keep state in the fastest cache levels. Example: "cache-adaptive aggregation"
cache-thrashing: Performance degradation caused by frequent cache evictions when working sets exceed cache capacity or conflict heavily. Example: "cache-thrashing hash aggregation"
columnar storage: A storage layout that organizes data by columns rather than rows to improve compression and analytical scan performance. Example: "columnar storage with encoding and compression"
compare-and-swap (CAS): An atomic instruction that conditionally updates a memory location if it matches an expected value, enabling lock-free synchronization. Example: "lock-free compare-and-swap (CAS)"
constrained optimization: Optimization under explicit limits or requirements on resources or performance metrics. Example: "constrained optimization"
false sharing: Cache-line contention that occurs when multiple threads update different variables that reside on the same cache line. Example: "to prevent false sharing"
GPU-native analytical processing: Query processing techniques and systems designed to run directly and efficiently on GPUs. Example: "GPU-native analytical processing"
hot runs: Performance measurements taken after data and code are warmed in caches/buffers, reflecting steady-state execution. Example: "hot runs"
index-aware sequential scan: A scan strategy that leverages index properties or layout to make sequential access more efficient. Example: "index-aware sequential scan"
join orders: The sequence in which tables are joined in a multi-table query, which can significantly affect performance. Example: "candidate join orders"
L1 cache: The smallest and fastest CPU cache close to each core, used to minimize memory access latency. Example: "fits entirely in L1 cache"
LIKE filter: A SQL predicate that performs pattern matching on strings using the LIKE operator. Example: "a five-way join with a LIKE filter"
load factor: The fraction of a hash table’s capacity that is occupied, influencing collision rates and probe length. Example: "let the load factor approach 100%"
mmap: A POSIX system call that maps files or devices into memory for efficient I/O. Example: "mmap"
Model Context Protocol (MCP): A protocol for standardizing tool and context integration with agentic or LLM systems. Example: "Model Context Protocol (MCP)"
multi-objective optimization: Simultaneously optimizing multiple objectives (e.g., speed and cost), often requiring trade-offs. Example: "multi-objective optimization"
multi-query optimization: Planning or generating code to share work or structures across multiple queries to reduce total cost. Example: "multi-query optimization"
nibble-packed: A compact representation that packs values into 4-bit units to reduce memory footprint. Example: "nibble-packed lookup files"
OLAP: Online Analytical Processing; workloads focused on large-scale analytical queries over historical data. Example: "OLAP workloads"
OLTP: Online Transaction Processing; workloads emphasizing many concurrent, short, transactional operations. Example: "OLTP"
open addressing: A collision resolution technique for hash tables where probing continues within the table to find an empty slot. Example: "open addressing"
OpenMP: A parallel programming model for shared-memory multiprocessing, providing compiler directives for threading. Example: "OpenMP parallel region"
parametric knowledge: Knowledge stored within the parameters of a trained model rather than in an external database. Example: "the LLM's parametric knowledge"
scale factor: A benchmark parameter specifying dataset size relative to a baseline. Example: "scale factor 10"
semantic query processing: Query processing that incorporates meaning and ML/LLM-driven semantics over unstructured or multimodal data. Example: "semantic query processing"
Service Level Agreement (SLA): A formal commitment specifying performance or reliability guarantees. Example: "Service Level Agreement (SLA)"
SIMD: Single Instruction, Multiple Data; CPU instructions that apply the same operation to multiple data elements in parallel. Example: "SIMD support"
TOON: A communication or compression approach used to make LLM/agent messaging more efficient. Example: "TOON"
TPC-H: A standard decision support benchmark for evaluating analytic database performance. Example: "TPC-H"
vector embedding: A numerical representation of objects (e.g., text, images) in a vector space for similarity search or ML operations. Example: "vector embedding"
workload-specific derived data structures: Auxiliary structures created for a particular workload to accelerate queries. Example: "workload-specific derived data structures"
zone-map pruning: Skipping data ranges during scans using per-block min/max metadata to avoid unnecessary reads. Example: "zone-map pruning"

View Paper Prompt View All Prompts

Open Problems

Balancing LLM Token Consumption and Generation Quality

Continue Learning

Authors (2)

Collections

Tweets

GenDB: The Next Generation of Query Processing -- Synthesized, Not Engineered (0 points, 1 comment)

GenDB: The Next Generation of Query Processing -- Synthesized, Not Engineered

Summary

GenDB: LLM-Driven Instance-Optimized Query Processing

Motivation and Context

System Architecture

Experimental Evaluation

Ablation Study

Design Implications

Optimization Granularity

Extensibility and Integration

Robustness and Communication

Future Directions

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

What is this paper about?

What questions were the researchers asking?

How did they do it?

What did they find, and why is it important?

What does this mean for the future?

In one sentence

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Practical Applications

Immediate Applications

Long-Term Applications

Glossary

Open Problems

Continue Learning

Authors (2)

Collections

Tweets

Reddit

Don't miss out on important new AI/ML research

GenDB: The Next Generation of Query Processing -- Synthesized, Not Engineered

Summary

GenDB: LLM-Driven Instance-Optimized Query Processing

Motivation and Context

System Architecture

Experimental Evaluation

Ablation Study

Design Implications

Optimization Granularity

Extensibility and Integration

Robustness and Communication

Future Directions

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

What is this paper about?

What questions were the researchers asking?

How did they do it?

What did they find, and why is it important?

What does this mean for the future?

In one sentence

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Practical Applications

Immediate Applications

Long-Term Applications

Glossary

Open Problems

Continue Learning

Related Papers

Authors (2)

Collections

Tweets

Reddit

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research