Kepler: Robust Learning for Faster Parametric Query Optimization (2306.06798v2)
Abstract: Most existing parametric query optimization (PQO) techniques rely on traditional query optimizer cost models, which are often inaccurate and result in suboptimal query performance. We propose Kepler, an end-to-end learning-based approach to PQO that demonstrates significant speedups in query latency over a traditional query optimizer. Central to our method is Row Count Evolution (RCE), a novel plan generation algorithm based on perturbations in the sub-plan cardinality space. While previous approaches require accurate cost models, we bypass this requirement by evaluating candidate plans via actual execution data and training an ML model to predict the fastest plan given parameter binding values. Our models leverage recent advances in neural network uncertainty in order to robustly predict faster plans while avoiding regressions in query performance. Experimentally, we show that Kepler achieves significant improvements in query runtime on multiple datasets on PostgreSQL.
- 2022. Introduction to Aurora PostgreSQL Query Plan Management. https://aws.amazon.com/blogs/database/introduction-to-aurora-postgresql-query-plan-management/
- 2022. Oracle: Improving Real-World Performance Through Cursor Sharing. https://docs.oracle.com/en/database/oracle/oracle-database/18/tgsql/improving-rwp-cursor-sharing.html
- 2022. Parameter Sensitivity Plan optimization. https://docs.microsoft.com/en-us/sql/relational-databases/performance/parameter-sensitivity-plan-optimization?view=sql-server-ver16
- 2022a. Skewed Data Generator for TPCH. https://github.com/gunaprsd/SkewedDataGenerator
- 2022b. TPCH Benchmark. https://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp
- Learning-based query performance modeling and prediction. In 2012 IEEE 28th International Conference on Data Engineering. IEEE, 390–401.
- Parametric plan caching using density-based clustering. In 2012 IEEE 28th International Conference on Data Engineering. IEEE, 402–413.
- Example-dependent cost-sensitive logistic regression for credit scoring. In 2014 13th International conference on machine learning and applications. IEEE, 263–269.
- Variance aware optimization of parameterized queries. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 531–542.
- Exact cardinality query optimization for optimizer testing. Proceedings of the VLDB Endowment 2, 1 (2009), 994–1005.
- OLTP-Bench: An Extensible Testbed for Benchmarking Relational Databases. PVLDB 7, 4 (2013), 277–288. http://www.vldb.org/pvldb/vol7/p277-difallah.pdf
- Ai meets ai: Leveraging query executions to improve index recommendations. In Proceedings of the 2019 International Conference on Management of Data. 1241–1258.
- Leveraging re-costing for online optimization of parameterized queries with guarantees. In Proceedings of the 2017 ACM International Conference on Management of Data. 1539–1554.
- Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation. arXiv preprint arXiv:2109.05877 (2021).
- Naveen Reddy Jayant R Haritsa. 2005. Analyzing plan diagrams of database query optimizers. In Proceedings of the 31st international conference on Very large data bases. VLDB Endowment. 1228–1239.
- Proper losses for learning with example-dependent costs. In Second International Workshop on Learning with Imbalanced Domains: Theory and Applications. PMLR, 52–66.
- Arvind Hulgeri and S Sudarshan. 2002. Parametric query optimization for linear and piecewise linear cost functions. In VLDB’02: Proceedings of the 28th International Conference on Very Large Databases. Elsevier, 167–178.
- Parametric query optimization. The VLDB Journal 6, 2 (1997), 132–151.
- Learned Cardinality Estimation: An In-depth Study. In Proceedings of the 2022 International Conference on Management of Data. 1214–1227.
- Learned cardinalities: Estimating correlated joins with deep learning. arXiv preprint arXiv:1809.00677 (2018).
- Learning to optimize join queries with deep reinforcement learning. arXiv preprint arXiv:1808.03196 (2018).
- How good are query optimizers, really? Proceedings of the VLDB Endowment 9, 3 (2015), 204–215.
- Simple and principled uncertainty estimation with deterministic deep learning via distance awareness. Advances in Neural Information Processing Systems 33 (2020), 7498–7512.
- Pre-training summarization models of structured datasets for cardinality estimation. Proceedings of the VLDB Endowment 15, 3 (2021), 414–426.
- Bao: Making learned query optimization practical. In Proceedings of the 2021 International Conference on Management of Data. 1275–1288.
- Neo: A learned query optimizer. arXiv preprint arXiv:1904.03711 (2019).
- Ryan Marcus and Olga Papaemmanouil. 2018a. Deep reinforcement learning for join order enumeration. In Proceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management. 1–4.
- Ryan Marcus and Olga Papaemmanouil. 2018b. Towards a hands-free query optimizer through deep learning. arXiv preprint arXiv:1809.10212 (2018).
- Flow-Loss: learning cardinality estimates that matter. arXiv preprint arXiv:2101.04964 (2021).
- Have query optimizers hit the wall? The VLDB Journal 31, 1 (2022), 181–200.
- Learned cardinality estimation: A design space exploration and a comparative evaluation. Proceedings of the VLDB Endowment 15, 1 (2021), 85–97.
- Immanuel Trummer. 2019. Exact cardinality query optimization with bounded execution cost. In proceedings of the 2019 international conference on management of data. 2–17.
- Grigorios Tsoumakas and Ioannis Katakis. 2007. Multi-label classification: An overview. International Journal of Data Warehousing and Mining (IJDWM) 3, 3 (2007), 1–13.
- Leveraging query logs and machine learning for parametric query optimization. Proceedings of the VLDB Endowment 15, 3 (2021), 401–413.
- Are we ready for learned cardinality estimation? arXiv preprint arXiv:2012.06743 (2020).
- Sampling-based query re-optimization. In Proceedings of the 2016 International Conference on Management of Data. 1721–1736.
- Balsa: Learning a Query Optimizer Without Expert Demonstrations. arXiv preprint arXiv:2201.01441 (2022).
- NeuroCard: one cardinality estimator for all tables. arXiv preprint arXiv:2006.08109 (2020).
- Deep unsupervised cardinality estimation. arXiv preprint arXiv:1905.04278 (2019).