Is Your Learned Query Optimizer Behaving As You Expect? A Machine Learning Perspective (2309.01551v2)
Abstract: The current boom of learned query optimizers (LQO) can be explained not only by the general continuous improvement of deep learning (DL) methods but also by the straightforward formulation of a query optimization problem (QOP) as a ML one. The idea is often to replace dynamic programming approaches, widespread for solving QOP, with more powerful methods such as reinforcement learning. However, such a rapid "game change" in the field of QOP could not pass without consequences - other parts of the ML pipeline, except for predictive model development, have large improvement potential. For instance, different LQOs introduce their own restrictions on training data generation from queries, use an arbitrary train/validation approach, and evaluate on a voluntary split of benchmark queries. In this paper, we attempt to standardize the ML pipeline for evaluating LQOs by introducing a new end-to-end benchmarking framework. Additionally, we guide the reader through each data science stage in the ML pipeline and provide novel insights from the machine learning perspective, considering the specifics of QOP. Finally, we perform a rigorous evaluation of existing LQOs, showing that PostgreSQL outperforms these LQOs in almost all experiments depending on the train/test splits.
- Ron Avnur and Joseph M. Hellerstein. 2000. Eddies: Continuously Adaptive Query Processing. SIGMOD Rec. 29, 2 (may 2000), 261–272. https://doi.org/10.1145/335191.335420
- Jason Brownlee. 2020. Data preparation for machine learning: data cleaning, feature selection, and data transforms in Python. Machine Learning Mastery.
- LOGER: A Learned Optimizer Towards Generating Efficient and Robust Query Execution Plans. Proceedings of the VLDB Endowment 16, 7 (2023), 1777–1789.
- LEON: A New Framework for ML-Aided Query Optimization. Proc. VLDB Endow. 16, 9 (2023), 2261–2273.
- Vijay Prakash Dwivedi and Xavier Bresson. 2021. A Generalization of Transformer Networks to Graphs. arXiv:2012.09699 [cs.LG]
- Database Systems: The Complete Book (2 ed.). Prentice Hall Press, USA.
- Deep Learning. MIT Press, Cambridge, MA, USA. http://www.deeplearningbook.org.
- Direct and indirect reinforcement learning. arXiv:1912.10600 [cs.LG]
- Isabelle Guyon and André Elisseeff. 2006. An Introduction to Feature Extraction. Springer Berlin Heidelberg, Berlin, Heidelberg, 1–25. https://doi.org/10.1007/978-3-540-35488-8_1
- Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation. VLDB 15, 4 (2022).
- Jonas Heitz and Kurt Stockinger. 2019. Join query optimization with deep reinforcement learning algorithms. arXiv preprint arXiv:1911.11689 (2019).
- DeepDB: Learn from Data, Not from Queries! Proc. VLDB Endow. 13, 7 (mar 2020), 992–1005. https://doi.org/10.14778/3384345.3384349
- Mouna Kacimi and Thomas Neumann. 2009. System R (R*) Optimizer. Springer US, Boston, MA, 2900–2905. https://doi.org/10.1007/978-0-387-39940-9_384
- Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the 5th International Conference on Learning Representations (Palais des Congrès Neptune, Toulon, France) (ICLR ’17). https://openreview.net/forum?id=SJU4ayYgl
- Learning to Optimize Join Queries With Deep Reinforcement Learning. (08 2018).
- Learning to optimize join queries with deep reinforcement learning. arXiv preprint arXiv:1808.03196 (2018).
- Joseph P La Salle. 1976. The stability of dynamical systems. SIAM.
- How good are query optimizers, really? Proceedings of the VLDB Endowment 9, 3 (2015), 204–215.
- Identifying hot and cold data in main-memory databases. In 2013 IEEE 29th International Conference on Data Engineering (ICDE). IEEE, 26–37.
- Tie-Yan Liu. 2009. Learning to Rank for Information Retrieval. Foundations and Trends® in Information Retrieval 3, 3 (2009), 225–331. https://doi.org/10.1561/1500000016
- H. B. Mann and D. R. Whitney. 1947. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics 18, 1 (1947), 50–60. http://www.jstor.org/stable/2236101
- Bao: Making learned query optimization practical. In Proceedings of the 2021 International Conference on Management of Data. 1275–1288.
- Neo: A Learned Query Optimizer. Proceedings of the VLDB Endowment 12, 11 (2019).
- Ryan Marcus and Olga Papaemmanouil. 2018a. Deep Reinforcement Learning for Join Order Enumeration. In Proceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (Houston, TX, USA) (aiDM’18). Association for Computing Machinery, New York, NY, USA, Article 3, 4 pages. https://doi.org/10.1145/3211954.3211957
- Ryan Marcus and Olga Papaemmanouil. 2018b. Towards a Hands-Free Query Optimizer through Deep Learning. (09 2018).
- Convolutional Neural Networks over Tree Structures for Programming Language Processing. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (Phoenix, Arizona) (AAAI’16). AAAI Press, 1287–1293.
- Nippon Telegraph and Telephone Corporation. 2012. pg_hint_plan Documentation. https://pghintplan.osdn.jp/pg_hint_plan.html. [Online; accessed August, 2023].
- Dušan Petković. 2011. Dynamic Programming Algorithm vs. Genetic Algorithm: Which is Faster?. In Research and Development in Intelligent Systems XXVII, Max Bramer, Miltos Petridis, and Adrian Hopgood (Eds.). Springer London, London, 483–488.
- Egor Rogov. 2022. Queries in PostgreSQL: Sort and Merge. https://postgrespro.com/blog/pgsql/5969770. [Online; accessed August, 2023].
- Stuart Russell and Peter Norvig. 2010. Artificial Intelligence: A Modern Approach (3 ed.). Prentice Hall.
- Tuning Your PostgreSQL Server. https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server. [Online; accessed August, 2023].
- Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction (second ed.). The MIT Press. http://incompleteideas.net/book/the-book-2nd.html
- Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Beijing, China, 1556–1566. https://doi.org/10.3115/v1/P15-1150
- The PostgreSQL Global Development Group. 2023. Genetic Query Optimization (GEQO) in PostgreSQL. https://www.postgresql.org/docs/current/geqo-pg-intro.html. [Online; accessed August, 2023].
- Transaction Processing Performance Council. 2023. TPC Benchmarks Overview. https://www.tpc.org/information/benchmarks5.asp. [Online; accessed August, 2023].
- Regularization of Neural Networks using DropConnect. In Proceedings of the 30th International Conference on Machine Learning (Proceedings of Machine Learning Research), Sanjoy Dasgupta and David McAllester (Eds.), Vol. 28. PMLR, Atlanta, Georgia, USA, 1058–1066. https://proceedings.mlr.press/v28/wan13.html
- Chihping Wang and Ming-Syan Chen. 1996. On the complexity of distributed query optimization. IEEE Transactions on Knowledge and Data Engineering 8, 4 (1996), 650–662.
- Geoffrey I. Webb. 2010. Overfitting. Springer US, Boston, MA, 744–744. https://doi.org/10.1007/978-0-387-30164-8_623
- Balsa: Learning a Query Optimizer Without Expert Demonstrations. Proceedings of the ACM SIGMOD International Conference on Management of Data (6 2022), 931–944. https://doi.org/10.1145/3514221.3517885
- NeuroCard: One Cardinality Estimator for All Tables. Proc. VLDB Endow. 14, 1 (sep 2020), 61–73. https://doi.org/10.14778/3421424.3421432
- Cost-based or learning-based? A hybrid query optimizer for query plan selection. Proceedings of the VLDB Endowment 15, 13 (2022), 3924–3936.
- Reinforcement learning with tree-lstm for join order selection. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 1297–1308.
- Simple Adaptive Query Processing vs. Learned Query Optimizers: Observations and Analysis. Proc. VLDB Endow. 16, 9 (2023), 2962–2975.
- Lero: A Learning-to-Rank Query Optimizer. arXiv preprint arXiv:2302.06873 (2023).
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.