Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Open-SQL Framework: Enhancing Text-to-SQL on Open-source Large Language Models (2405.06674v1)

Published 4 May 2024 in cs.CL and cs.AI

Abstract: Despite the success of LLMs in Text-to-SQL tasks, open-source LLMs encounter challenges in contextual understanding and response coherence. To tackle these issues, we present \ours, a systematic methodology tailored for Text-to-SQL with open-source LLMs. Our contributions include a comprehensive evaluation of open-source LLMs in Text-to-SQL tasks, the \openprompt strategy for effective question representation, and novel strategies for supervised fine-tuning. We explore the benefits of Chain-of-Thought in step-by-step inference and propose the \openexample method for enhanced few-shot learning. Additionally, we introduce token-efficient techniques, such as \textbf{Variable-length Open DB Schema}, \textbf{Target Column Truncation}, and \textbf{Example Column Truncation}, addressing challenges in large-scale databases. Our findings emphasize the need for further investigation into the impact of supervised fine-tuning on contextual learning capabilities. Remarkably, our method significantly improved Llama2-7B from 2.54\% to 41.04\% and Code Llama-7B from 14.54\% to 48.24\% on the BIRD-Dev dataset. Notably, the performance of Code Llama-7B surpassed GPT-4 (46.35\%) on the BIRD-Dev dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. An encoder-decoder framework translating natural language to database queries. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, pages 3977–3983.
  2. Shuaichen Chang and Eric Fosler-Lussier. 2023. How to prompt llms for text-to-sql: A study in zero-shot, single-domain, and cross-domain settings.
  3. Recent advances in text-to-sql: A survey of what we have and what we expect. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2166–2187.
  4. A survey for in-context learning. CoRR, abs/2301.00234.
  5. C3: zero-shot text-to-sql with chatgpt. CoRR, abs/2307.07306.
  6. Text-to-sql empowered by large language models: A benchmark evaluation. CoRR, abs/2308.15363.
  7. A case-based reasoning framework for adaptive prompting in cross-domain text-to-sql. CoRR, abs/2304.13301.
  8. Towards complex text-to-sql in cross-domain database with intermediate representation. In Proceedings of the 57th Conference of the Association for Computational Linguistics, pages 4524–4535.
  9. Lora: Low-rank adaptation of large language models.
  10. S2sql: Injecting syntax to question-schema interaction graph encoder for text-to-sql parsers. In Findings of the Association for Computational Linguistics, pages 1254–1262.
  11. Re-examining the role of schema linking in text-to-sql. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6943–6954.
  12. Resdsql: Decoupling schema linking and skeleton parsing for text-to-sql. AAAI-23, page 13067–13075. AAAI Press.
  13. Graphix-t5: Mixing pre-trained transformers with graph-aware layers for text-to-sql parsing. In 37th AAAI Conference on Artificial Intelligence, pages 13076–13084.
  14. Can LLM already serve as A database interface? A big bench for large-scale database grounded text-to-sqls. CoRR, abs/2305.03111.
  15. A comprehensive evaluation of chatgpt’s zero-shot text-to-sql capability. CoRR, abs/2303.13547.
  16. Multi-hop relational graph attention network for text-to-sql parsing. In International Joint Conference on Neural Networks, pages 1–8.
  17. What makes good in-context examples for gpt-3? In Proceedings of Deep Learning Inside Out: The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, pages 100–114.
  18. Enhancing few-shot text-to-sql capabilities of large language models: A study on prompt design strategies. CoRR, abs/2305.12586.
  19. OpenAI. 2023a. GPT-4 technical report. CoRR, abs/2303.08774.
  20. OpenAI. 2023b. Introducing chatgpt. https://openai.com/blog/chatgpt. Last accessed on 2023-07-24.
  21. OpenAI. 2023c. Sql translate. https://platform.openai.com/examples/default-sql-translate. Last accessed on 2023-07-24.
  22. Addressing limitations of encoder-decoder based approach to text-to-sql. In Proceedings of the 29th International Conference on Computational Linguistics, pages 1593–1603.
  23. Mohammadreza Pourreza and Davood Rafiei. 2023. Din-sql: Decomposed in-context learning of text-to-sql with self-correction.
  24. RASAT: integrating relational structures into pretrained seq2seq model for text-to-sql. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 3215–3229.
  25. PICARD: parsing incrementally for constrained auto-regressive decoding from language models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9895–9901.
  26. Sql-palm: Improved large language model adaptation for text-to-sql. CoRR, abs/2306.00739.
  27. Llama: Open and efficient foundation language models. CoRR, abs/2302.13971.
  28. Llama2: Open foundation and fine-tuned chat models. CoRR.
  29. RAT-SQL: relation-aware schema encoding and linking for text-to-sql parsers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7567–7578.
  30. Proton: Probing schema linking information from pre-trained language models for text-to-sql parsing. In The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 1889–1898.
  31. Sql-to-text generation with graph-to-sequence model. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 931–936.
  32. Tabert: Pretraining for joint understanding of textual and tabular data. Cornell University - arXiv,Cornell University - arXiv.
  33. HIE-SQL: history information enhanced network for context-dependent text-to-sql semantic parsing. In Findings of the Association for Computational Linguistics, pages 2997–3007.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Xiaojun Chen (100 papers)
  2. Tianle Wang (30 papers)
  3. Tianhao Qiu (1 paper)
  4. Jianbin Qin (13 papers)
  5. Min Yang (239 papers)
Citations (2)