Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TCSR-SQL: Towards Table Content-aware Text-to-SQL with Self-retrieval (2407.01183v2)

Published 1 Jul 2024 in cs.DB

Abstract: LLM-based (LLM-based) Text-to-SQL methods have achieved important progress in generating SQL queries for real-world applications. When confronted with table content-aware questions in real-world scenarios, ambiguous data content keywords and non-existent database schema column names within the question leads to the poor performance of existing methods. To solve this problem, we propose a novel approach towards Table Content-aware Text-to-SQL with Self-Retrieval (TCSR-SQL). It leverages LLM's in-context learning capability to extract data content keywords within the question and infer possible related database schema, which is used to generate Seed SQL to fuzz search databases. The search results are further used to confirm the encoding knowledge with the designed encoding knowledge table, including column names and exact stored content values used in the SQL. The encoding knowledge is sent to obtain the final Precise SQL following multi-rounds of generation-execution-revision process. To validate our approach, we introduce a table-content-aware, question-related benchmark dataset, containing 1,692 question-SQL pairs. Comprehensive experiments conducted on this benchmark demonstrate the remarkable performance of TCSR-SQL, achieving an improvement of at least 13.7% in execution accuracy compared to other state-of-the-art methods.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Wenbo Xu (23 papers)
  2. Liang Yan (18 papers)
  3. Peiyi Han (4 papers)
  4. Haifeng Zhu (2 papers)
  5. Chuanyi Liu (12 papers)
  6. Shaoming Duan (3 papers)
  7. Cuiyun Gao (97 papers)
  8. Yingwei Liang (1 paper)