UNITE: A Unified Benchmark for Text-to-SQL Evaluation (2305.16265v3)

Published 25 May 2023 in cs.CL

Abstract: A practical text-to-SQL system should generalize well on a wide variety of natural language questions, unseen database schemas, and novel SQL query structures. To comprehensively evaluate text-to-SQL systems, we introduce a UNIfied benchmark for Text-to-SQL Evaluation (UNITE). It is composed of publicly available text-to-SQL datasets, containing natural language questions from more than 12 domains, SQL queries from more than 3.9K patterns, and 29K databases. Compared to the widely used Spider benchmark, we introduce $\sim$120K additional examples and a threefold increase in SQL patterns, such as comparative and boolean questions. We conduct a systematic study of six state-of-the-art (SOTA) text-to-SQL parsers on our new benchmark and show that: 1) Codex performs surprisingly well on out-of-domain datasets; 2) specially designed decoding methods (e.g. constrained beam search) can improve performance for both in-domain and out-of-domain settings; 3) explicitly modeling the relationship between questions and schemas further improves the Seq2Seq models. More importantly, our benchmark presents key challenges towards compositional generalization and robustness issues -- which these SOTA models cannot address well. Our code and data processing script are available at https://github.com/awslabs/unified-text2sql-benchmark

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (18)

Wuwei Lan (12 papers)
Zhiguo Wang (100 papers)
Anuj Chauhan (3 papers)
Henghui Zhu (24 papers)
Alexander Li (6 papers)
Jiang Guo (22 papers)
Sheng Zhang (212 papers)
Chung-Wei Hang (14 papers)
Joseph Lilien (2 papers)
Yiqun Hu (8 papers)
Lin Pan (23 papers)
Mingwen Dong (6 papers)
Jun Wang (990 papers)
Jiarong Jiang (8 papers)
Stephen Ash (3 papers)
Vittorio Castelli (24 papers)
Patrick Ng (29 papers)
Bing Xiang (74 papers)

Citations (7)

View on Semantic Scholar

GitHub

GitHub - awslabs/unified-text2sql-benchmark: UNITE: A Unified Benchmark for Text-to-SQL Evaluation (61 stars)

Tweets

https://twitter.com/charles_irl/status/1761878869118697735

UNITE: A Unified Benchmark for Text-to-SQL Evaluation (2305.16265v3)

Related Papers

GitHub

Tweets