Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
127 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
53 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
10 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

Bridging the Gap: Transforming Natural Language Questions into SQL Queries via Abstract Query Pattern and Contextual Schema Markup (2502.14682v1)

Published 20 Feb 2025 in cs.CL

Abstract: LLMs have demonstrated excellent performance in many tasks, including Text-to-SQL, due to their powerful in-context learning capabilities. They are becoming the mainstream approach for Text-to-SQL. However, these methods still have a significant gap compared to human performance, especially on complex questions. As the complexity of questions increases, the gap between questions and SQLs increases. We identify two important gaps: the structural mapping gap and the lexical mapping gap. To tackle these two gaps, we propose PAS-SQL, an efficient SQL generation pipeline based on LLMs, which alleviates gaps through Abstract Query Pattern (AQP) and Contextual Schema Markup (CSM). AQP aims to obtain the structural pattern of the question by removing database-related information, which enables us to find structurally similar demonstrations. CSM aims to associate database-related text span in the question with specific tables or columns in the database, which alleviates the lexical mapping gap. Experimental results on the Spider and BIRD datasets demonstrate the effectiveness of our proposed method. Specifically, PAS-SQL + GPT-4o sets a new state-of-the-art on the Spider benchmark with an execution accuracy of 87.9\%, and achieves leading results on the BIRD dataset with an execution accuracy of 64.67\%.

Summary

We haven't generated a summary for this paper yet.