PASTA: Table-Operations Aware Fact Verification via Sentence-Table Cloze Pre-training (2211.02816v1)

Published 5 Nov 2022 in cs.CL, cs.AI, and cs.LG

Abstract: Fact verification has attracted a lot of research attention recently, e.g., in journalism, marketing, and policymaking, as misinformation and disinformation online can sway one's opinion and affect one's actions. While fact-checking is a hard task in general, in many cases, false statements can be easily debunked based on analytics over tables with reliable information. Hence, table-based fact verification has recently emerged as an important and growing research area. Yet, progress has been limited due to the lack of datasets that can be used to pre-train LLMs (LMs) to be aware of common table operations, such as aggregating a column or comparing tuples. To bridge this gap, in this paper we introduce PASTA, a novel state-of-the-art framework for table-based fact verification via pre-training with synthesized sentence-table cloze questions. In particular, we design six types of common sentence-table cloze tasks, including Filter, Aggregation, Superlative, Comparative, Ordinal, and Unique, based on which we synthesize a large corpus consisting of 1.2 million sentence-table pairs from WikiTables. PASTA uses a recent pre-trained LM, DeBERTaV3, and further pretrains it on our corpus. Our experimental results show that PASTA achieves new state-of-the-art performance on two table-based fact verification benchmarks: TabFact and SEM-TAB-FACTS. In particular, on the complex set of TabFact, which contains multiple operations, PASTA largely outperforms the previous state of the art by 4.7 points (85.6% vs. 80.9%), and the gap between PASTA and human performance on the small TabFact test set is narrowed to just 1.5 points (90.6% vs. 92.1%).

Authors (6)

Zihui Gu (7 papers)
Ju Fan (26 papers)
Nan Tang (63 papers)
Preslav Nakov (253 papers)
Xiaoman Zhao (1 paper)
Xiaoyong Du (40 papers)

Citations (42)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

GitHub

GitHub - ruc-datalab/PASTA: This repository contains source code for the PASTA model, a pre-trained language model for table-based fact verification. (18 stars)

PASTA: Table-Operations Aware Fact Verification via Sentence-Table Cloze Pre-training (2211.02816v1)

Summary

Related Papers

GitHub