No more optimization rules: LLM-enabled policy-based multi-modal query optimizer (2403.13597v2)
Abstract: LLM has marked a pivotal moment in the field of machine learning and deep learning. Recently its capability for query planning has been investigated, including both single-modal and multi-modal queries. However, there is no work on the query optimization capability of LLM. As a critical (or could even be the most important) step that significantly impacts the execution performance of the query plan, such analysis and attempts should not be missed. From another aspect, existing query optimizers are usually rule-based or rule-based + cost-based, i.e., they are dependent on manually created rules to complete the query plan rewrite/transformation. Given the fact that modern optimizers include hundreds to thousands of rules, designing a multi-modal query optimizer following a similar way is significantly time-consuming since we will have to enumerate as many multi-modal optimization rules as possible, which has not been well addressed today. In this paper, we investigate the query optimization ability of LLM and use LLM to design LaPuda, a novel LLM and Policy based multi-modal query optimizer. Instead of enumerating specific and detailed rules, LaPuda only needs a few abstract policies to guide LLM in the optimization, by which much time and human effort are saved. Furthermore, to prevent LLM from making mistakes or negative optimization, we borrow the idea of gradient descent and propose a guided cost descent (GCD) algorithm to perform the optimization, such that the optimization can be kept in the correct direction. In our evaluation, our methods consistently outperform the baselines in most cases. For example, the optimized plans generated by our methods result in 1~3x higher execution speed than those by the baselines.
- Graph of thoughts: Solving elaborate problems with large language models. arXiv preprint arXiv:2308.09687 (2023).
- Tom Bylander. 1994. The computational complexity of propositional STRIPS planning. Artificial Intelligence 69, 1-2 (1994), 165–204.
- Eduardo F Camacho and Carlos Bordons Alba. 2013. Model predictive control. Springer science & business media.
- Shikra: Unleashing Multimodal LLM’s Referential Dialogue Magic. arXiv preprint arXiv:2306.15195 (2023).
- Can Knowledge Graphs Simplify Text?. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 379–389.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
- Task and motion planning with large language models for object rearrangement. arXiv preprint arXiv:2303.06247 (2023).
- Luciano Floridi and Massimo Chiriatti. 2020. GPT-3: Its nature, scope, limits, and consequences. Minds and Machines 30 (2020), 681–694.
- Injecting Numerical Reasoning Skills into Language Models. arXiv:2004.04487 [cs.CL]
- Reasoning with language model is planning with world model. arXiv preprint arXiv:2305.14992 (2023).
- Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608 (2022).
- Task planning in robotics: an empirical comparison of pddl-and asp-based systems. Frontiers of Information Technology & Electronic Engineering 20 (2019), 363–373.
- GPT is becoming a Turing machine: Here are some ways to program it. arXiv preprint arXiv:2303.14310 (2023).
- Large language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916 (2022).
- Solving Quantitative Reasoning Problems with Language Models. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 3843–3857. https://proceedings.neurips.cc/paper_files/paper/2022/file/18abbeef8cfe9203fdf9053c9c4fe191-Paper-Conference.pdf
- BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. In ICML.
- LLM+ P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:2304.11477 (2023).
- Faithful chain-of-thought reasoning. arXiv preprint arXiv:2301.13379 (2023).
- John McCarthy. 1963. Situations, actions, and causal laws. Technical Report. STANFORD UNIV CA DEPT OF COMPUTER SCIENCE.
- REFINER: Reasoning Feedback on Intermediate Representations. arXiv preprint arXiv:2304.01904 (2023).
- Reflexion: an autonomous agent with dynamic memory and self-reflection. ArXiv abs/2303.11366 (2023).
- Progprompt: Generating situated robot task plans using large language models. arXiv preprint arXiv:2209.11302 (2022).
- Reimagining Retrieval Augmented Language Models for Answering Queries. arXiv:2306.01061 [cs.CL]
- Matthias Urban and Carsten Binnig. 2023. CAESURA: Language Models as Multi-Modal Query Planners. arXiv:2308.03424 [cs.DB]
- Attention is all you need. Advances in neural information processing systems 30 (2017).
- Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171 (2022).
- Yifan Wang and Daisy Zhe Wang. 2022. Extensible Database Simulator for Fast Prototyping In-Database Algorithms. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management (Atlanta, GA, USA) (CIKM ’22). Association for Computing Machinery, New York, NY, USA, 5029–5033. https://doi.org/10.1145/3511808.3557205
- Is ChatGPT a good sentiment analyzer? A preliminary study. arXiv preprint arXiv:2304.04339 (2023).
- Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903 (2022).
- Generating Sequences by Learning to Self-Correct. arXiv preprint arXiv:2211.00053 (2022).
- Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601 (2023).
- Instruct-FinGPT: Financial Sentiment Analysis by Instruction Tuning of General-Purpose Large Language Models. arXiv preprint arXiv:2306.12659 (2023).
- MemoryBank: Enhancing Large Language Models with Long-Term Memory. arXiv:2305.10250 [cs.CL]
- Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625 (2022).