Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SParC: Cross-Domain Semantic Parsing in Context (1906.02285v1)

Published 5 Jun 2019 in cs.CL and cs.AI

Abstract: We present SParC, a dataset for cross-domainSemanticParsing inContext that consists of 4,298 coherent question sequences (12k+ individual questions annotated with SQL queries). It is obtained from controlled user interactions with 200 complex databases over 138 domains. We provide an in-depth analysis of SParC and show that it introduces new challenges compared to existing datasets. SParC demonstrates complex contextual dependencies, (2) has greater semantic diversity, and (3) requires generalization to unseen domains due to its cross-domain nature and the unseen databases at test time. We experiment with two state-of-the-art text-to-SQL models adapted to the context-dependent, cross-domain setup. The best model obtains an exact match accuracy of 20.2% over all questions and less than10% over all interaction sequences, indicating that the cross-domain setting and the con-textual phenomena of the dataset present significant challenges for future research. The dataset, baselines, and leaderboard are released at https://yale-lily.github.io/sparc.

Analysis of SParC: Cross-Domain Semantic Parsing in Context

The paper under review introduces SParC, an extensive dataset aimed at advancing the field of semantic parsing with a cross-domain approach. This work highlights the challenges and complexities in mapping natural language interactions into executable SQL queries across different domains. The dataset, resulting from interactions with 200 complex databases across 138 domains, comprises 4,298 coherent question sequences and over 12,000 individual questions annotated with SQL queries. SParC significantly broadens the scope of text-to-SQL tasks by addressing context-dependent semantic parsing—a largely under-explored area in computational linguistics.

Core Characteristics of SParC

The salient features of SParC can be summarized as follows:

  1. Complex Contextual Dependencies: Unlike previous datasets that focus primarily on single-turn questions, SParC incorporates sequences of related queries, each depending on the context established by its predecessors. This poses substantial challenges in disambiguating and correctly interpreting the user's ultimate intent—a task critical for conversational database querying.
  2. Semantic Diversity: The dataset covers a wide range of semantic phenomena, necessitating sophisticated query interpretation mechanisms. With diverse SQL components spread across questions, SParC urges the development of models adept at handling complex syntactic and semantic structures.
  3. Cross-Domain Generalization: SParC's cross-domain nature requires models to exhibit strong generalization capabilities. During testing, they face unseen databases, pushing the boundaries of domain-independent semantic parsing models.

Experimental Results: Complexity of Contextual Phenomena

The paper benchmarks the performance of two state-of-the-art models, CD-Seq2Seq and SyntaxSQLNet (with contextual adaptation), against SParC's rigorous standards. Notably, the highest exact set match accuracy achieved by these models is 20.2% for individual questions and under 10% for entire interaction sequences. These results underscored the dataset's challenge, as substantial contextual nuances hinder the direct translation of natural languages to SQL queries, illustrating ample room for improvement in context utilization and SQL generation strategies. Moreover, performance analysis across different turns shows that models degrade significantly in accuracy as context-dependent complexity accumulates—a clear indication of the difficulty in maintaining context integrity throughout interactions.

Implications and Future Directions in AI

From a theoretical standpoint, SParC emphasizes the need for enhanced neural architectures that better capture context across varied domains. Practically, developing systems that effectively parse multi-turn, context-sensitive queries can transform the user experience in interacting with databases, facilitating more intuitive and efficient data retrieval processes. Furthermore, exploring approaches such as dynamically updating discourse context states or hybrid models combining sequential memory networks with syntax-based parsing might prove beneficial. Future work could also investigate meta-learning techniques to improve domain generalization, thereby enabling models to adapt swiftly to new scenarios without extensive retraining.

In sum, SParC not only challenges existing methodologies but also sets a new precedent for cross-domain, context-aware semantic parsing. The release of this dataset, along with baselines and leaderboards, provides a robust framework for researchers to innovate and propel the field towards more human-like database interaction capabilities. This paper offers a substantial contribution to multilingual, multi-domain AI, marking a pivotal step in the evolution of database query interpretation.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (19)
  1. Tao Yu (282 papers)
  2. Rui Zhang (1138 papers)
  3. Michihiro Yasunaga (48 papers)
  4. Yi Chern Tan (9 papers)
  5. Xi Victoria Lin (39 papers)
  6. Suyi Li (26 papers)
  7. Heyang Er (1 paper)
  8. Irene Li (47 papers)
  9. Bo Pang (77 papers)
  10. Tao Chen (397 papers)
  11. Emily Ji (1 paper)
  12. Shreya Dixit (2 papers)
  13. David Proctor (3 papers)
  14. Sungrok Shim (4 papers)
  15. Jonathan Kraft (1 paper)
  16. Vincent Zhang (5 papers)
  17. Caiming Xiong (337 papers)
  18. Richard Socher (115 papers)
  19. Dragomir Radev (98 papers)
Citations (174)
Github Logo Streamline Icon: https://streamlinehq.com