Unsupervised Pretraining for Fact Verification by Language Model Distillation (2309.16540v3)

Published 28 Sep 2023 in cs.CL, cs.LG, and stat.ML

Abstract: Fact verification aims to verify a claim using evidence from a trustworthy knowledge base. To address this challenge, algorithms must produce features for every claim that are both semantically meaningful, and compact enough to find a semantic alignment with the source information. In contrast to previous work, which tackled the alignment problem by learning over annotated corpora of claims and their corresponding labels, we propose SFAVEL (Self-supervised Fact Verification via LLM Distillation), a novel unsupervised pretraining framework that leverages pre-trained LLMs to distil self-supervised features into high-quality claim-fact alignments without the need for annotations. This is enabled by a novel contrastive loss function that encourages features to attain high-quality claim and evidence alignments whilst preserving the semantic relationships across the corpora. Notably, we present results that achieve a new state-of-the-art on FB15k-237 (+5.3% Hits@1) and FEVER (+8% accuracy) with linear evaluation.

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

References (74)

Authors (3)

Adrián Bazaga (10 papers)
Pietro Liò (270 papers)
Gos Micklem (7 papers)

Citations (1)

View on Semantic Scholar

GitHub

GitHub - AdrianBZG/SFAVEL: Code for "Unsupervised Pretraining for Fact Verification by Language Model Distillation" (ICLR 2024) (5 stars)

Unsupervised Pretraining for Fact Verification by Language Model Distillation (2309.16540v3)

Related Papers

GitHub

Tweets