Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing (2009.13845v2)

Published 29 Sep 2020 in cs.CL and cs.AI

Abstract: We present GraPPa, an effective pre-training approach for table semantic parsing that learns a compositional inductive bias in the joint representations of textual and tabular data. We construct synthetic question-SQL pairs over high-quality tables via a synchronous context-free grammar (SCFG) induced from existing text-to-SQL datasets. We pre-train our model on the synthetic data using a novel text-schema linking objective that predicts the syntactic role of a table field in the SQL for each question-SQL pair. To maintain the model's ability to represent real-world data, we also include masked LLMing (MLM) over several existing table-and-language datasets to regularize the pre-training process. On four popular fully supervised and weakly supervised table semantic parsing benchmarks, GraPPa significantly outperforms RoBERTa-large as the feature representation layers and establishes new state-of-the-art results on all of them.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Tao Yu (282 papers)
  2. Chien-Sheng Wu (77 papers)
  3. Xi Victoria Lin (39 papers)
  4. Bailin Wang (34 papers)
  5. Yi Chern Tan (9 papers)
  6. Xinyi Yang (33 papers)
  7. Dragomir Radev (98 papers)
  8. Richard Socher (115 papers)
  9. Caiming Xiong (337 papers)
Citations (230)

Summary

We haven't generated a summary for this paper yet.