GPTuner: LLM-Driven DBMS Tuning
- GPTuner is a database tuning framework that integrates LLMs and Bayesian optimization to autonomously configure DBMS knobs by extracting expert knowledge from manuals, forums, and LLM outputs.
- The framework employs an LLM-based knowledge extraction pipeline and workload-aware search space pruning to drastically reduce the configuration space, enabling faster convergence.
- Experimental results show that GPTuner reduces PostgreSQL TPC-H latency by 44% and improves MySQL TPC-C throughput by 10% over prior methods, converging in roughly 20 iterations.
GPTuner is a database tuning framework that integrates LLMs with Bayesian optimization to efficiently and autonomously configure complex database management systems (DBMS). By systematically extracting and leveraging heterogeneous domain knowledge from manuals, forums, and LLMs themselves, GPTuner automates the traditionally labor-intensive and expertise-driven task of DBMS knob selection and parameter tuning. The system prioritizes minimizing tuning cost, maximizing performance (throughput and latency), and achieving robust results across workloads and database platforms without requiring manual human intervention, annotation, or offline training (Lao et al., 2023).
1. Motivation and Challenges in Database Tuning
Modern DBMSs such as PostgreSQL and MySQL expose hundreds of tunable "knobs"—parameters governing memory, I/O, query planning thresholds, and more. The optimal configuration of these knobs is highly sensitive to both server hardware and the workload profile, exhibiting pronounced non-linear and workload-dependent interactions. Traditional tuning approaches fall into two categories:
- Manual tuning: Experts consult documentation and community resources, leveraging anecdotal and domain-based heuristics. Manual processes yield high-quality insights (e.g., "set shared_buffers to 25–40% of RAM") but are slow, do not scale with configuration space, and provide limited reproducibility.
- Automatic tuning via black-box optimization: Machine-learning methods, including Bayesian optimization and reinforcement learning, optimize configuration by repeated benchmarking. These are generalizable but require vast numbers of costly benchmark evaluations and fail to exploit rich domain knowledge, often searching in intractably large and irrelevant parameter regions.
GPTuner addresses these limitations by automatically reading, distilling, and structuring expert tuning knowledge with an LLM, and using that knowledge to guide and radically diminish the search space, achieving order-of-magnitude speedups in finding optimal configurations (Lao et al., 2023).
2. LLM-Based Knowledge Extraction and Structuring
GPTuner introduces a Knowledge Handler pipeline that systematizes the ingestion of diverse natural-language sources:
- Knowledge Preparation: The system integrates sources—including official manuals (), web-crawled forum posts (), and GPT-4-generated tips—per knob in the system’s knob list . Contradictory or out-of-range hints are detected and filtered using simple rules and LLM-based consistency checks. Source prioritization is established: manual forum LLM.
- Factual Consistency Loop: The LLM checks each per-knob summary against source texts, regenerating summaries as needed to resolve inconsistencies.
- Prompt Ensemble Extraction: The system defines a schema per knob, capturing
suggested_values,min_value,max_value, andspecial_values. For each attribute, GPTuner issues distinct prompts to the LLM, processes their JSON outputs, and aggregates via element-wise majority voting (with sufficing in practical settings). This yields a robust, unified "Tuning Lake" containing machine-curated expert guidance for each parameter.
Example structured schema:
1 2 3 4 5 6 7 |
{
"knob_name": "shared_buffers",
"suggested_values": ["4GB", "6.4GB"],
"min_value": "4GB",
"max_value": "6.4GB",
"special_values": []
} |
3. Workload-Aware Knob Selection and Search Space Pruning
GPTuner’s Search Space Optimizer leverages the structured knowledge to drastically reduce the effective configuration space:
- Knob Selection: The LLM simulates a database administrator’s reasoning at multiple granularities:
- System level: "Which knobs matter for ?"
- Workload level: "Which knobs are relevant for workload (e.g., OLTP throughput)?"
- Query analysis: Given an EXPLAIN plan for each high-impact query, "Which knobs address this bottleneck?"
- Dependency analysis: "Are there interdependent knobs to include?" Output sets typically shrink from 100+ to 20–30 knobs per workload.
Range Optimization: For each selected knob, GPTuner (i) discards extreme vendor-supplied ranges in favor of expert-delineated intervals, (ii) builds a “Tiny Feasible Space” by discretizing around expert-suggested values, and (iii) applies "Virtual Knob Extension" to handle special categorical or sentinel values (e.g., 0 = “disable”). This compresses the search to a high-confidence region with 3–5 options per knob.
| Stage | Action | Result |
|---|---|---|
| Knob selection | Multi-level LLM-driven filtering of 100+ knobs | ~20–30 knobs |
| Range optimization | Discard unsafe/extreme ranges, discretize around expert answers, manage special values | Tiny, high-quality search subspace |
4. Coarse-to-Fine Bayesian Optimization Framework
GPTuner orchestrates a two-stage Bayesian optimization (BO) procedure:
- Coarse-Grained Stage: BO is run exclusively in the Tiny Feasible Space, using Latin Hypercube samples for initialization, followed by acquisition-driven iterations (e.g., ) leveraging a random forest surrogate and classic acquisition functions such as Expected Improvement (EI):
where .
- Fine-Grained Stage: To refine the optimum:
- The surrogate model is re-initialized with prior samples.
- The pruned expert-compliant space is reconstructed.
- "Virtual Knob Extension" ensures reachability of special discrete settings.
- Bayesian optimization proceeds over the entire pruned continuous/categorical space until the evaluation budget is depleted.
Algorithmic summary (main loop):
1 2 3 4 5 6 7 8 9 10 11 |
TFS = buildTinySpace(S) M = initSurrogate() for i in 1…n: sample p_i in TFS; evaluate, update M for i in 1…C: x_i = argmax_{x in TFS} Acq(x;M); y_i = Eval(DB, x_i); update M while budget remains: P' = RegionDiscard(fullSpace, S) P'' = VirtualExtension(P', S) x = argmax_{x in P''} Acq(x;M) y = Eval(DB, x); update M return best x seen |
5. Experimental Evaluation and Key Findings
The GPTuner system was evaluated on PostgreSQL v14.9 and MySQL v8.0, using the TPC-H (decision support, 95th-percentile latency) and TPC-C (OLTP throughput) benchmarks. Experimental details:
- Hardware: 24-core Xeon, 110GB RAM, SSD.
- Tuned knobs: 60 (PostgreSQL), 40 (MySQL).
- Baselines: DDPG++ (RL), Gaussian Process, SMAC (random forest BO), DB-BERT (BERT+RL).
- Budget: 100 tuning iterations, median of 3 runs.
Notable results:
- Convergence speed: GPTuner achieves within 5% of best Postgres TPC-H latency in ~20 iterations (vs. ~500+ for SMAC/GP).
- Final performance:
- PostgreSQL/TPC-H (lower is better; final latency relative to default): Default 1.00×, SMAC 0.87×, DB-BERT 0.63×, GPTuner 0.56× — a 44% reduction vs. default, 11% lower than DB-BERT.
- MySQL/TPC-C (throughput, tps): Default 240, SMAC 395, DB-BERT 930, GPTuner 1029 (10% improvement over DB-BERT).
- Overall speedup: Convergence is on average 16× faster than the best competing non-LLM tuners.
- Ablation study: Every layer of the system—LLM-based cleaning, workload-aware knob selection, search space pruning, and two-stage BO—contributes 10–30% improvements to convergence speed or final performance.
| Metric/Setting | Default | SMAC | DB-BERT | GPTuner |
|---|---|---|---|---|
| Postgres TPC-H latency | 1.00× | 0.87× | 0.63× | 0.56× |
| MySQL TPC-C throughput | 240 | 395 | 930 | 1029 |
| Iterations to 5% optimum | — | 500+ | — | ~20 |
| Relative convergence | — | 1× | — | 16× faster |
6. Advantages, Limitations, and Prospective Extensions
Advantages:
- Order-of-magnitude reduction in tuning iterations required for near-optimal performance.
- Up to 30% improvement in throughput or latency relative to prior approaches.
- No need for offline training data, model distillation, or human-annotated labels; all guidance is extracted at run-time from LLM-inferred, structured, and curated representations.
- Easily extensible for new workloads or DBMSs by rerunning the extraction and structuring pipeline.
Limitations:
- Reliance on external LLM APIs (cost, latency, LLM drift risk).
- Potential for LLM hallucination, though factual consistency loops mitigate most erroneous outputs.
- Static knob knowledge; adaptation to dynamic or cloud-elastic environments is not addressed.
Open Problems and Future Directions:
- Dynamic, online adaptation using real-time metrics (e.g., CPU, topology shifts).
- Multi-objective tuning (incorporating cost/resource constraints alongside latency/throughput).
- Transfer learning from previously constructed Tuning Lakes across DBMS versions or hardware configurations (Lao et al., 2023).
GPTuner’s approach—systematizing and distilling domain expertise via LLMs, and tightly integrating this knowledge with sample-efficient Bayesian optimization—establishes a new paradigm for autonomous, robust, and performant DBMS configuration.