Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Natural SQL: Making SQL Easier to Infer from Natural Language Specifications (2109.05153v1)

Published 11 Sep 2021 in cs.CL

Abstract: Addressing the mismatch between natural language descriptions and the corresponding SQL queries is a key challenge for text-to-SQL translation. To bridge this gap, we propose an SQL intermediate representation (IR) called Natural SQL (NatSQL). Specifically, NatSQL preserves the core functionalities of SQL, while it simplifies the queries as follows: (1) dispensing with operators and keywords such as GROUP BY, HAVING, FROM, JOIN ON, which are usually hard to find counterparts for in the text descriptions; (2) removing the need for nested subqueries and set operators; and (3) making schema linking easier by reducing the required number of schema items. On Spider, a challenging text-to-SQL benchmark that contains complex and nested SQL queries, we demonstrate that NatSQL outperforms other IRs, and significantly improves the performance of several previous SOTA models. Furthermore, for existing models that do not support executable SQL generation, NatSQL easily enables them to generate executable SQL queries, and achieves the new state-of-the-art execution accuracy.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yujian Gan (6 papers)
  2. Xinyun Chen (80 papers)
  3. Jinxia Xie (5 papers)
  4. Matthew Purver (32 papers)
  5. John R. Woodward (3 papers)
  6. John Drake (1 paper)
  7. Qiaofu Zhang (4 papers)
Citations (77)

Summary

We haven't generated a summary for this paper yet.