Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bao: Learning to Steer Query Optimizers (2004.03814v1)

Published 8 Apr 2020 in cs.DB

Abstract: Query optimization remains one of the most challenging problems in data management systems. Recent efforts to apply machine learning techniques to query optimization challenges have been promising, but have shown few practical gains due to substantive training overhead, inability to adapt to changes, and poor tail performance. Motivated by these difficulties and drawing upon a long history of research in multi-armed bandits, we introduce Bao (the BAndit Optimizer). Bao takes advantage of the wisdom built into existing query optimizers by providing per-query optimization hints. Bao combines modern tree convolutional neural networks with Thompson sampling, a decades-old and well-studied reinforcement learning algorithm. As a result, Bao automatically learns from its mistakes and adapts to changes in query workloads, data, and schema. Experimentally, we demonstrate that Bao can quickly (an order of magnitude faster than previous approaches) learn strategies that improve end-to-end query execution performance, including tail latency. In cloud environments, we show that Bao can offer both reduced costs and better performance compared with a sophisticated commercial system.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Ryan Marcus (33 papers)
  2. Parimarjan Negi (6 papers)
  3. Hongzi Mao (11 papers)
  4. Nesime Tatbul (20 papers)
  5. Mohammad Alizadeh (58 papers)
  6. Tim Kraska (78 papers)
Citations (170)

Summary

Bao: Learning to Multiplex Simple Query Optimizers

The paper introduces Bao, a novel approach to query optimization within database management systems by leveraging a collection of simple query optimizers. The authors address significant drawbacks encountered by current machine learning-based query optimizers, such as high data requirements, sensitivity to changes in the database schema, and potential catastrophic failures.

The core strategy proposed involves utilizing multiple simple query optimizers and determining their viability through a contextual multi-armed bandit framework. This approach enables the optimizer to select the most fitting simple query optimizer for incoming queries without risking disastrous performance outcomes. The authors capitalize on the premise that constructing complex query optimizers is challenging, whereas developing relatively straightforward optimizers is achievable with less complexity. The methodology applies a learned tree convolutional model to estimate the effectiveness of each simple optimizer and choose the optimal option accordingly.

Key contributions of Bao include:

  • Demonstrating superior performance compared to traditional query optimizers with minimal training data requirement, approximately 100 query executions.
  • Ensuring adaptability to changes in data distribution and schema without necessitating extensive retraining.
  • Ensuring stable query performance and avoiding catastrophic outcomes even when encountered with limited training data.

While methodical query optimizers like those based on machine learning have shown promise, they often require an impractical volume of training data, must be retrained frequently, and risk severe performance regression in rare cases. Bao circumvents these challenges by adopting a scheme that requires fewer resources and remains robust amidst changes.

Preliminary results highlighted in the paper demonstrate Bao's proficiency, especially when tested on the JOB dataset within PostgreSQL. Over consecutive iterations, the approach consistently achieves lower regret levels compared to PostgreSQL's native optimizer. This signifies Bao's capability to identify more optimal query plans effectively.

The implications of deploying Bao within a real-world DBMS are multi-fold. Practically, it proposes an avenue to enhance query performance with economic resource expenditure concerning data collection and retraining processes. Theoretically, Bao signifies progress in bridging the gap between ML techniques and adaptable DBMS query optimizations. Future work envisaged by the authors intends to scrutinize Bao's handling of schema or data distribution changes more thoroughly and undertake comprehensive experimental evaluation of the system.

In summary, the paper presents Bao not merely as a competitive query optimizer alternative but as an incremental step towards achieving more reliable, flexible, and scalable database systems. These merits possibly forecast shifts in how query optimizations are integrated and utilized across diverse data management environments.