An Actor-Critic Approach to Boosting Text-to-SQL Large Language Model (2410.22082v1)
Abstract: Text-To-SQL (T2S) conversion based on LLMs has found a wide range of applications, by leveraging the capabilities of LLMs in interpreting the query intent expressed in natural language. Existing research focuses on suitable representations for data schema and/or questions, task-specific instructions and representative examples, and complicated inference pipelines. All these methods are empirical and task specific, without a theoretical bound on performance. In this paper, we propose a simple, general, and performance guaranteed T2S enhancement approach called Actor-Critic (AC). Specifically, we design two roles using the same LLM: an Actor to produce SQL queries and a Critic to evaluate the produced SQL. If the Critic believes the produced SQL is wrong, it notifies the Actor to reproduce the SQL and perform evaluation again. By this simple iterative process, expected performance can be derived in theory. We conducted extensive experiments on the Spider and related datasets with eleven LLMs, and demonstrated that the Actor-Critic method consistently improves the performance of T2S, thus serving as a general enhancement approach for T2S conversion.
- Representing Schema Structure with Graph Neural Networks for Text-to-SQL Parsing. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 4560–4565.
- Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
- Shuaichen Chang and Eric Fosler-Lussier. 2023. How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings. In NeurIPS 2023 Second Table Representation Learning Workshop.
- Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
- Vicuna: An open-source chatbot impressing gpt-4 with 90% chatgpt quality. https://lmsys.org/blog/2023-03-30-vicuna/
- RYANSQL: Recursively Applying Sketch-based Slot Fillings for Complex Text-to-SQL in Cross-Domain Databases. Computational Linguistics 47, 2 (2021), 309–332.
- Deep reinforcement learning from human preferences. Advances in neural information processing systems 30 (2017).
- Free Dolly: Introducing the World’s First Truly Open Instruction-Tuned LLM. https://huggingface.co/databricks/dolly-v2-12b
- Recent Advances in Text-to-SQL: A Survey of What We Have and What We Expect. In Proceedings of the 29th International Conference on Computational Linguistics. 2166–2187.
- Qlora: Efficient finetuning of quantized llms. Advances in Neural Information Processing Systems 36 (2024).
- C3: Zero-shot text-to-sql with chatgpt. arXiv preprint arXiv:2307.07306 (2023).
- Towards Robustness of Text-to-SQL Models against Synonym Substitution. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2505–2515.
- Exploring Underexplored Limitations of Cross-Domain Text-to-SQL Generalization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 8926–8931.
- Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation. Proceedings of the VLDB Endowment 17, 5 (2024), 1132–1145.
- A case-based reasoning framework for adaptive prompting in cross-domain text-to-sql. arXiv preprint arXiv:2304.13301 (2023).
- Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 4524–4535.
- Knowledge-to-sql: Enhancing sql generation with data expert llm. arXiv preprint arXiv:2402.11517 (2024).
- AI safety via debate. stat 1050 (2018), 22.
- Regularized Best-of-N Sampling to Mitigate Reward Hacking for Language Model Alignment. arXiv preprint arXiv:2404.01054 (2024).
- Richard M Karp. 2010. Reducibility among combinatorial problems. Springer.
- George Katsogiannis-Meimarakis and Georgia Koutrika. 2023. A survey on deep learning approaches for text-to-SQL. The VLDB Journal 32, 4 (2023), 905–936.
- Scalable agent alignment via reward modeling: a research direction. arXiv preprint arXiv:1811.07871 (2018).
- Resdsql: Decoupling schema linking and skeleton parsing for text-to-sql. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 13067–13075.
- A comprehensive evaluation of ChatGPT’s zero-shot Text-to-SQL capability. arXiv preprint arXiv:2303.13547 (2023).
- What Makes Good In-Context Examples for GPT-3?. In Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures. 100–114.
- LLM Critics Help Catch LLM Bugs. arXiv:2407.00215 [cs.SE] https://arxiv.org/abs/2407.00215
- Meta. 2024. Introducing Meta Llama 3: The most capable openly available LLM to date. https://ai.meta.com/blog/meta-llama-3/.
- Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332 (2021).
- Enhancing text-to-SQL capabilities of large language models: A study on prompt design strategies. In Findings of the Association for Computational Linguistics: EMNLP 2023. 14935–14956.
- OpenAI. 2022. Introducing ChatGPT. https://openai.com/index/chatgpt/.
- OpenAI. 2024a. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL] https://arxiv.org/abs/2303.08774
- OpenAI. 2024b. Hello GPT-4o. https://openai.com/index/hello-gpt-4o/.
- Training language models to follow instructions with human feedback. Advances in neural information processing systems 35 (2022), 27730–27744.
- Mohammadreza Pourreza and Davood Rafiei. 2024. Din-sql: Decomposed in-context learning of text-to-sql with self-correction. Advances in Neural Information Processing Systems 36 (2024).
- Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
- Evaluating the text-to-sql capabilities of large language models. arXiv preprint arXiv:2204.00498 (2022).
- Self-critiquing models for assisting human evaluators. CoRR (2022).
- Adi Shamir. 1992. Ip= pspace. Journal of the ACM (JACM) 39, 4 (1992), 869–877.
- Learning to summarize with human feedback. Advances in Neural Information Processing Systems 33 (2020), 3008–3021.
- Reboost Large Language Model-based Text-to-SQL, Text-to-Python, and Text-to-Function–with Real Applications in Traffic Domain. arXiv preprint arXiv:2310.18752 (2023).
- Sql-palm: Improved large language modeladaptation for text-to-sql. arXiv preprint arXiv:2306.00739 (2023).
- Battle of the Large Language Models: Dolly vs LLaMA vs Vicuna vs Guanaco vs Bard vs ChatGPT-A Text-to-SQL Parsing Comparison. In Findings of the Association for Computational Linguistics: EMNLP 2023. 11225–11238.
- Gemma Team. 2024. Gemma: Open models based on gemini research and technology. arXiv:2403.08295 (2024).
- LLama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
- LLama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
- Mac-sql: Multi-agent collaboration for text-to-sql. arXiv preprint arXiv:2312.11242 (2023).
- RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 7567–7578.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems 35 (2022), 24824–24837.
- Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3911–3921.
- Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation. arXiv preprint arXiv:2403.02951 (2024).
- Semantic Evaluation for Text-to-SQL with Distilled Test Suites. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 396–411.
- Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. In Proceedings of the Eleventh International Conference on Learning Representations (ICLR 2023). OpenReview, 1–20. https://openreview.net/forum?id=WZH7099tgfM