Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Korean-Specific Dataset for Table Question Answering (2201.06223v2)

Published 17 Jan 2022 in cs.CL

Abstract: Existing question answering systems mainly focus on dealing with text data. However, much of the data produced daily is stored in the form of tables that can be found in documents and relational databases, or on the web. To solve the task of question answering over tables, there exist many datasets for table question answering written in English, but few Korean datasets. In this paper, we demonstrate how we construct Korean-specific datasets for table question answering: Korean tabular dataset is a collection of 1.4M tables with corresponding descriptions for unsupervised pre-training LLMs. Korean table question answering corpus consists of 70k pairs of questions and answers created by crowd-sourced workers. Subsequently, we then build a pre-trained LLM based on Transformer and fine-tune the model for table question answering with these datasets. We then report the evaluation results of our model. We make our datasets publicly available via our GitHub repository and hope that those datasets will help further studies for question answering over tables, and for the transformation of table formats.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Changwook Jun (4 papers)
  2. Jooyoung Choi (21 papers)
  3. Myoseop Sim (2 papers)
  4. Hyun Kim (17 papers)
  5. Hansol Jang (5 papers)
  6. Kyungkoo Min (2 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.