Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks (2201.09745v4)

Published 24 Jan 2022 in cs.CL and cs.IR

Abstract: Since a vast number of tables can be easily collected from web pages, spreadsheets, PDFs, and various other document types, a flurry of table pre-training frameworks have been proposed following the success of text and images, and they have achieved new state-of-the-arts on various tasks such as table question answering, table type recognition, column relation classification, table search, formula prediction, etc. To fully use the supervision signals in unlabeled tables, a variety of pre-training objectives have been designed and evaluated, for example, denoising cell values, predicting numerical relationships, and implicitly executing SQLs. And to best leverage the characteristics of (semi-)structured tables, various tabular LLMs, particularly with specially-designed attention mechanisms, have been explored. Since tables usually appear and interact with free-form text, table pre-training usually takes the form of table-text joint pre-training, which attracts significant research interests from multiple domains. This survey aims to provide a comprehensive review of different model designs, pre-training objectives, and downstream tasks for table pre-training, and we further share our thoughts and vision on existing challenges and future opportunities.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Haoyu Dong (55 papers)
  2. Zhoujun Cheng (19 papers)
  3. Xinyi He (19 papers)
  4. Mengyu Zhou (24 papers)
  5. Anda Zhou (1 paper)
  6. Fan Zhou (110 papers)
  7. Ao Liu (54 papers)
  8. Shi Han (74 papers)
  9. Dongmei Zhang (193 papers)
Citations (57)