Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PURPLE: Making a Large Language Model a Better SQL Writer (2403.20014v1)

Published 29 Mar 2024 in cs.DB, cs.AI, and cs.CL

Abstract: LLM techniques play an increasingly important role in Natural Language to SQL (NL2SQL) translation. LLMs trained by extensive corpora have strong natural language understanding and basic SQL generation abilities without additional tuning specific to NL2SQL tasks. Existing LLMs-based NL2SQL approaches try to improve the translation by enhancing the LLMs with an emphasis on user intention understanding. However, LLMs sometimes fail to generate appropriate SQL due to their lack of knowledge in organizing complex logical operator composition. A promising method is to input the LLMs with demonstrations, which include known NL2SQL translations from various databases. LLMs can learn to organize operator compositions from the input demonstrations for the given task. In this paper, we propose PURPLE (Pre-trained models Utilized to Retrieve Prompts for Logical Enhancement), which improves accuracy by retrieving demonstrations containing the requisite logical operator composition for the NL2SQL task on hand, thereby guiding LLMs to produce better SQL translation. PURPLE achieves a new state-of-the-art performance of 80.5% exact-set match accuracy and 87.8% execution match accuracy on the validation set of the popular NL2SQL benchmark Spider. PURPLE maintains high accuracy across diverse benchmarks, budgetary constraints, and various LLMs, showing robustness and cost-effectiveness.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Tonghui Ren (3 papers)
  2. Yuankai Fan (3 papers)
  3. Zhenying He (10 papers)
  4. Ren Huang (2 papers)
  5. Jiaqi Dai (9 papers)
  6. Can Huang (43 papers)
  7. Yinan Jing (6 papers)
  8. Kai Zhang (542 papers)
  9. Yifan Yang (578 papers)
  10. X. Sean Wang (14 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.