Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 79 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 15 tok/s Pro
GPT-5 High 15 tok/s Pro
GPT-4o 100 tok/s Pro
Kimi K2 186 tok/s Pro
GPT OSS 120B 445 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

Is Your Learned Query Optimizer Behaving As You Expect? A Machine Learning Perspective (2309.01551v2)

Published 4 Sep 2023 in cs.DB

Abstract: The current boom of learned query optimizers (LQO) can be explained not only by the general continuous improvement of deep learning (DL) methods but also by the straightforward formulation of a query optimization problem (QOP) as a ML one. The idea is often to replace dynamic programming approaches, widespread for solving QOP, with more powerful methods such as reinforcement learning. However, such a rapid "game change" in the field of QOP could not pass without consequences - other parts of the ML pipeline, except for predictive model development, have large improvement potential. For instance, different LQOs introduce their own restrictions on training data generation from queries, use an arbitrary train/validation approach, and evaluate on a voluntary split of benchmark queries. In this paper, we attempt to standardize the ML pipeline for evaluating LQOs by introducing a new end-to-end benchmarking framework. Additionally, we guide the reader through each data science stage in the ML pipeline and provide novel insights from the machine learning perspective, considering the specifics of QOP. Finally, we perform a rigorous evaluation of existing LQOs, showing that PostgreSQL outperforms these LQOs in almost all experiments depending on the train/test splits.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Ron Avnur and Joseph M. Hellerstein. 2000. Eddies: Continuously Adaptive Query Processing. SIGMOD Rec. 29, 2 (may 2000), 261–272. https://doi.org/10.1145/335191.335420
  2. Jason Brownlee. 2020. Data preparation for machine learning: data cleaning, feature selection, and data transforms in Python. Machine Learning Mastery.
  3. LOGER: A Learned Optimizer Towards Generating Efficient and Robust Query Execution Plans. Proceedings of the VLDB Endowment 16, 7 (2023), 1777–1789.
  4. LEON: A New Framework for ML-Aided Query Optimization. Proc. VLDB Endow. 16, 9 (2023), 2261–2273.
  5. Vijay Prakash Dwivedi and Xavier Bresson. 2021. A Generalization of Transformer Networks to Graphs. arXiv:2012.09699 [cs.LG]
  6. Database Systems: The Complete Book (2 ed.). Prentice Hall Press, USA.
  7. Deep Learning. MIT Press, Cambridge, MA, USA. http://www.deeplearningbook.org.
  8. Direct and indirect reinforcement learning. arXiv:1912.10600 [cs.LG]
  9. Isabelle Guyon and André Elisseeff. 2006. An Introduction to Feature Extraction. Springer Berlin Heidelberg, Berlin, Heidelberg, 1–25. https://doi.org/10.1007/978-3-540-35488-8_1
  10. Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation. VLDB 15, 4 (2022).
  11. Jonas Heitz and Kurt Stockinger. 2019. Join query optimization with deep reinforcement learning algorithms. arXiv preprint arXiv:1911.11689 (2019).
  12. DeepDB: Learn from Data, Not from Queries! Proc. VLDB Endow. 13, 7 (mar 2020), 992–1005. https://doi.org/10.14778/3384345.3384349
  13. Mouna Kacimi and Thomas Neumann. 2009. System R (R*) Optimizer. Springer US, Boston, MA, 2900–2905. https://doi.org/10.1007/978-0-387-39940-9_384
  14. Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the 5th International Conference on Learning Representations (Palais des Congrès Neptune, Toulon, France) (ICLR ’17). https://openreview.net/forum?id=SJU4ayYgl
  15. Learning to Optimize Join Queries With Deep Reinforcement Learning. (08 2018).
  16. Learning to optimize join queries with deep reinforcement learning. arXiv preprint arXiv:1808.03196 (2018).
  17. Joseph P La Salle. 1976. The stability of dynamical systems. SIAM.
  18. How good are query optimizers, really? Proceedings of the VLDB Endowment 9, 3 (2015), 204–215.
  19. Identifying hot and cold data in main-memory databases. In 2013 IEEE 29th International Conference on Data Engineering (ICDE). IEEE, 26–37.
  20. Tie-Yan Liu. 2009. Learning to Rank for Information Retrieval. Foundations and Trends® in Information Retrieval 3, 3 (2009), 225–331. https://doi.org/10.1561/1500000016
  21. H. B. Mann and D. R. Whitney. 1947. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics 18, 1 (1947), 50–60. http://www.jstor.org/stable/2236101
  22. Bao: Making learned query optimization practical. In Proceedings of the 2021 International Conference on Management of Data. 1275–1288.
  23. Neo: A Learned Query Optimizer. Proceedings of the VLDB Endowment 12, 11 (2019).
  24. Ryan Marcus and Olga Papaemmanouil. 2018a. Deep Reinforcement Learning for Join Order Enumeration. In Proceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (Houston, TX, USA) (aiDM’18). Association for Computing Machinery, New York, NY, USA, Article 3, 4 pages. https://doi.org/10.1145/3211954.3211957
  25. Ryan Marcus and Olga Papaemmanouil. 2018b. Towards a Hands-Free Query Optimizer through Deep Learning. (09 2018).
  26. Convolutional Neural Networks over Tree Structures for Programming Language Processing. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (Phoenix, Arizona) (AAAI’16). AAAI Press, 1287–1293.
  27. Nippon Telegraph and Telephone Corporation. 2012. pg_hint_plan Documentation. https://pghintplan.osdn.jp/pg_hint_plan.html. [Online; accessed August, 2023].
  28. Dušan Petković. 2011. Dynamic Programming Algorithm vs. Genetic Algorithm: Which is Faster?. In Research and Development in Intelligent Systems XXVII, Max Bramer, Miltos Petridis, and Adrian Hopgood (Eds.). Springer London, London, 483–488.
  29. Egor Rogov. 2022. Queries in PostgreSQL: Sort and Merge. https://postgrespro.com/blog/pgsql/5969770. [Online; accessed August, 2023].
  30. Stuart Russell and Peter Norvig. 2010. Artificial Intelligence: A Modern Approach (3 ed.). Prentice Hall.
  31. Tuning Your PostgreSQL Server. https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server. [Online; accessed August, 2023].
  32. Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction (second ed.). The MIT Press. http://incompleteideas.net/book/the-book-2nd.html
  33. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Beijing, China, 1556–1566. https://doi.org/10.3115/v1/P15-1150
  34. The PostgreSQL Global Development Group. 2023. Genetic Query Optimization (GEQO) in PostgreSQL. https://www.postgresql.org/docs/current/geqo-pg-intro.html. [Online; accessed August, 2023].
  35. Transaction Processing Performance Council. 2023. TPC Benchmarks Overview. https://www.tpc.org/information/benchmarks5.asp. [Online; accessed August, 2023].
  36. Regularization of Neural Networks using DropConnect. In Proceedings of the 30th International Conference on Machine Learning (Proceedings of Machine Learning Research), Sanjoy Dasgupta and David McAllester (Eds.), Vol. 28. PMLR, Atlanta, Georgia, USA, 1058–1066. https://proceedings.mlr.press/v28/wan13.html
  37. Chihping Wang and Ming-Syan Chen. 1996. On the complexity of distributed query optimization. IEEE Transactions on Knowledge and Data Engineering 8, 4 (1996), 650–662.
  38. Geoffrey I. Webb. 2010. Overfitting. Springer US, Boston, MA, 744–744. https://doi.org/10.1007/978-0-387-30164-8_623
  39. Balsa: Learning a Query Optimizer Without Expert Demonstrations. Proceedings of the ACM SIGMOD International Conference on Management of Data (6 2022), 931–944. https://doi.org/10.1145/3514221.3517885
  40. NeuroCard: One Cardinality Estimator for All Tables. Proc. VLDB Endow. 14, 1 (sep 2020), 61–73. https://doi.org/10.14778/3421424.3421432
  41. Cost-based or learning-based? A hybrid query optimizer for query plan selection. Proceedings of the VLDB Endowment 15, 13 (2022), 3924–3936.
  42. Reinforcement learning with tree-lstm for join order selection. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 1297–1308.
  43. Simple Adaptive Query Processing vs. Learned Query Optimizers: Observations and Analysis. Proc. VLDB Endow. 16, 9 (2023), 2962–2975.
  44. Lero: A Learning-to-Rank Query Optimizer. arXiv preprint arXiv:2302.06873 (2023).
Citations (3)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.