Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 69 tok/s

Gemini 2.5 Pro 42 tok/s Pro

GPT-5 Medium 30 tok/s Pro

GPT-5 High 34 tok/s Pro

GPT-4o 111 tok/s Pro

Kimi K2 202 tok/s Pro

GPT OSS 120B 452 tok/s Pro

Claude Sonnet 4.5 34 tok/s Pro

2000 character limit reached

WikiFactDiff: A Large, Realistic, and Temporally Adaptable Dataset for Atomic Factual Knowledge Update in Causal Language Models (2403.14364v1)

Published 21 Mar 2024 in cs.CL

Abstract: The factuality of LLM (LLMs) tends to decay over time since events posterior to their training are "unknown" to them. One way to keep models up-to-date could be factual update: the task of inserting, replacing, or removing certain simple (atomic) facts within the model. To study this task, we present WikiFactDiff, a dataset that describes the evolution of factual knowledge between two dates as a collection of simple facts divided into three categories: new, obsolete, and static. We describe several update scenarios arising from various combinations of these three types of basic update. The facts are represented by subject-relation-object triples; indeed, WikiFactDiff was constructed by comparing the state of the Wikidata knowledge base at 4 January 2021 and 27 February 2023. Those fact are accompanied by verbalization templates and cloze tests that enable running update algorithms and their evaluation metrics. Contrary to other datasets, such as zsRE and CounterFact, WikiFactDiff constitutes a realistic update setting that involves various update scenarios, including replacements, archival, and new entity insertions. We also present an evaluation of existing update algorithms on WikiFactDiff.

Citations (3)

View on Semantic Scholar

Summary

The paper introduces WikiFactDiff, a dataset capturing real-world factual changes to study LLM update algorithms.
It employs detailed preprocessing, difference detection, and rule-based classification to label factual updates accurately.
Evaluation reveals the dataset's ability to benchmark algorithms in balancing model freshness with maintaining factual integrity.

WikiFactDiff: Constructing an Adaptable Dataset for Real-World Factual Knowledge Updates in LLMs

Introduction

The crux of maintaining the factual accuracy of LLMs over time lies in the challenge of updating them with new information. This paper introduces WikiFactDiff, a novel dataset aimed at facilitating the empirical paper of factual knowledge updates within LLMs. Unlike previously available datasets, WikiFactDiff offers a comprehensive framework for examining a wide array of update scenarios, including the introduction of new facts, the obsolescence of outdated information, and the persistence of unaltered data. By presenting a temporally adaptable dataset derived from the evolution of Wikidata entries, it sets a new standard for realistic and applicable research in the field of knowledge update algorithms.

Dataset Overview

WikiFactDiff differentiates itself through several key attributes:

Realism: Unlike other datasets that might rely on fictional or artificially generated updates, WikiFactDiff is anchored in real-world changes extracted from Wikidata, spanning from January 2021 to February 2023.
Comprehensiveness: The dataset encompasses a broad spectrum of updates, categorized into distinct scenarios such as replacing obsolete facts, introducing new entities, and archiving outdated information.
Temporal Adaptability: WikiFactDiff is designed to be periodically refreshed, aligning its updates with the evolving landscape of global knowledge, thereby remaining relevant for future use in LLM research.

Dataset Construction

The construction of WikiFactDiff involves several meticulously designed stages to ensure the dataset's quality and relevance:

Preprocessing and Difference Detection: By comparing Wikidata dumps at two points in time, the dataset captures the delta of factual knowledge, categorizing into new, obsolete, and static facts.
New Entity Detection: Identifies entities that have emerged within the timeframe of the dataset, a critical aspect for studying the insertion of new knowledge into LLMs.
Classification Rules: Utilizes a set of hand-crafted rules to label the updates accurately, providing clear distinctions between different types of knowledge changes.
Neighbor Fact Identification: To address the specificity of updates, the dataset includes mechanisms for identifying related facts that could be affected by a given update, highlighting the interconnected nature of factual knowledge.
Verbalization and Cloze Tests: Each entry within the dataset is supplemented with natural language sentences and cloze tests, allowing for a direct application and evaluation of update algorithms.

The paper underscores the nuanced challenges of creating a temporally adaptable dataset that can serve the evolving needs of LLM research, highlighting the dataset's alignment with existing knowledge bases like Wikidata.

Evaluation of Update Algorithms

Through the application of WikiFactDiff, the paper evaluates several existing knowledge update algorithms. It emphasizes not only the efficacy of these algorithms in incorporating new facts into LLMs but also examines their ability to maintain the accuracy of unrelated information, showcasing the dataset's role in balancing the need for model freshness with knowledge integrity. The evaluation sheds light on the varying capabilities of different algorithms to handle realistic updates, contributing valuable insights to the ongoing development of more sophisticated knowledge update methodologies.

Implications and Future Directions

The introduction of WikiFactDiff paves the way for a deeper understanding of how LLMs can be kept current in a world of constant informational change. Its emphasis on realistic update scenarios, coupled with a robust construction methodology, positions it as a pivotal resource for advancing research in this area. By setting a new benchmark for dataset realism and adaptability, the work invites future research to explore innovative update algorithms capable of navigating the complex landscape of factual knowledge dynamically.

In essence, WikiFactDiff not only enriches the toolkit available for LLM researchers but also highlights the critical importance of dataset design in the pursuit of more adaptable and accurate LLMs. The paper calls for a continued effort to refine update algorithms, with a vision towards models that can seamlessly integrate the relentless influx of new information, thereby remaining ever-relevant in their application domains.