An End-to-End Learning-based Cost Estimator (1906.02560v1)

Published 6 Jun 2019 in cs.DB

Abstract: Cost and cardinality estimation is vital to query optimizer, which can guide the plan selection. However traditional empirical cost and cardinality estimation techniques cannot provide high-quality estimation, because they cannot capture the correlation between multiple columns. Recently the database community shows that the learning-based cardinality estimation is better than the empirical methods. However, existing learning-based methods have several limitations. Firstly, they can only estimate the cardinality, but cannot estimate the cost. Secondly, convolutional neural network (CNN) with average pooling is hard to represent complicated structures, e.g., complex predicates, and the model is hard to be generalized. To address these challenges, we propose an effective end-to-end learning-based cost estimation framework based on a tree-structured model, which can estimate both cost and cardinality simultaneously. To the best of our knowledge, this is the first end-to-end cost estimator based on deep learning. We propose effective feature extraction and encoding techniques, which consider both queries and physical operations in feature extraction. We embed these features into our tree-structured model. We propose an effective method to encode string values, which can improve the generalization ability for predicate matching. As it is prohibitively expensive to enumerate all string values, we design a patten-based method, which selects patterns to cover string values and utilizes the patterns to embed string values. We conducted experiments on real-world datasets and experimental results showed that our method outperformed baselines.

Citations (200)

View on Semantic Scholar

Summary

The paper presents a novel end-to-end learning-based framework for database query optimization that simultaneously estimates both cost and cardinality.
The model employs a tree-structured design with effective feature extraction and string encoding techniques to handle complex query structures and generalize across various predicates.
Empirical results using real-world datasets demonstrate the model's superior performance over traditional methods, highlighting the potential of integrating deep learning into database management systems.

An End-to-End Learning-Based Cost Estimator for Databases

This paper presents an innovative approach to cost estimation in database query optimization, highlighting the development of an end-to-end learning-based framework. Traditional methods for cost and cardinality estimation often fall short, especially in capturing the complex interdependencies among multiple database columns. The authors propose a novel tree-structured model that addresses these shortcomings by estimating both cost and cardinality simultaneously, marking a significant methodological stand-out among existing techniques focused solely on cardinality.

The proposed model is meticulously designed, incorporating effective feature extraction and encoding techniques. By considering both the query structures and physical execution plans, the model effectively captures the intricate query plan characteristics. The model partitions query plans into nodes, each transformed into vector representations that use a combination of a one-hot encoding scheme and embeddings, particularly to handle string-based predicates.

A noteworthy feature is the methodology for encoding string values, which enhances the model's generalization capabilities when processing queries involving complex predicates such as those with string values. This pattern-based encoding method significantly reduces the computational expense associated with enumerating string values, covering a wide range of potential query scenarios.

The empirical validation of the model was conducted using real-world datasets, providing robust evidence of its superior performance over traditional baseline methods. The experimental results demonstrate the model's effectiveness in accurately estimating costs and cardinalities across various query complexities and types, with particular emphasis on workloads involving numeric predicates and more complex scenarios with string predicates.

The research implications are broad, both in practical terms for database optimization and theoretically in advancing machine learning applications within the field. This framework potentially enhances the performance characteristics of query optimizers, thereby leading to more efficient query execution strategies. Furthermore, the approach represents a shift toward integrating deep learning techniques into database management systems, opening avenues for future exploration in query optimization using artificial intelligence.

Moreover, the paper offers a window into future advancements, suggesting that the integration of end-to-end learning models could revolutionize the way cost estimators are developed and deployed. This could lead to a shift away from traditional empirically-based techniques toward more adaptable and intelligent systems.

In conclusion, this paper represents a significant development in the field of database query optimization, showcasing the efficacy of integrating machine learning in estimating query execution costs and cardinalities. The framework's adaptability through its advanced feature encoding and tree-structured modeling approach augurs well for its application across diverse database management environments, promising gains in both efficiency and accuracy. As research progresses, such methodologies could become integral to the future landscape of intelligent database systems.

An End-to-End Learning-based Cost Estimator (1906.02560v1)

Summary

An End-to-End Learning-Based Cost Estimator for Databases

Related Papers