Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 69 tok/s

Gemini 2.5 Pro 52 tok/s Pro

GPT-5 Medium 37 tok/s Pro

GPT-5 High 28 tok/s Pro

GPT-4o 119 tok/s Pro

Kimi K2 218 tok/s Pro

GPT OSS 120B 456 tok/s Pro

Claude Sonnet 4.5 33 tok/s Pro

2000 character limit reached

MuLan: A Study of Fact Mutability in Language Models (2404.03036v1)

Published 3 Apr 2024 in cs.CL

Abstract: Facts are subject to contingencies and can be true or false in different circumstances. One such contingency is time, wherein some facts mutate over a given period, e.g., the president of a country or the winner of a championship. Trustworthy LLMs ideally identify mutable facts as such and process them accordingly. We create MuLan, a benchmark for evaluating the ability of English LLMs to anticipate time-contingency, covering both 1:1 and 1:N relations. We hypothesize that mutable facts are encoded differently than immutable ones, hence being easier to update. In a detailed evaluation of six popular LLMs, we consistently find differences in the LLMs' confidence, representations, and update behavior, depending on the mutability of a fact. Our findings should inform future work on the injection of and induction of time-contingent knowledge to/from LLMs.

Citations (6)

View on Semantic Scholar

Summary

The paper reveals that language models show lower confidence when querying mutable facts compared to immutable ones.
The paper introduces the MuLan benchmark, using a balanced dataset to assess how LMs encode and differentiate mutable and immutable facts.
The paper demonstrates that mutable facts are updated more readily, highlighting LMs' inherent temporal awareness in their stored representations.

Exploring Fact Mutability in LLMs with the MuLan Benchmark

Introduction to Fact Mutability

LLMs (LMs) encode a wealth of factual knowledge garnered from their extensive training corpora. However, as time progresses, certain facts may change - what we call "mutable facts" (e.g., the current president of a country) - posing a challenge for LMs in maintaining their factual accuracy. Conversely, "immutable facts" remain unchanged over time (e.g., the capital of a country). The paper introduces MuLan, a benchmark aimed at understanding how effectively LMs can handle mutable versus immutable facts, focusing on three key research questions: the models' confidence and performance on mutable facts, the representational difference in encoding mutable versus immutable facts, and the ease of updating mutable facts over immutable ones.

The MuLan Benchmark: Dataset and Design

MuLan serves as a comprehensive tool for evaluating LMs' handling of fact mutability, encompassing facts of varying mutability (Immutable-1, Immutable-N, and Mutable) and their representation within the model. The dataset creation process involved selecting a balanced mix of relations, providing a rich playground for probing LMs' behavior with respect to mutability. Evaluating against MuLan involves probing LMs with queries constructed from the dataset, assessing not just the factual accuracy but also the models' confidence in their responses.

Empirical Findings

The investigation reveals several insights into LMs' handling of mutable versus immutable facts:

Confidence and Performance: LMs exhibit noticeable differences in confidence when addressing mutable versus immutable facts, with a more pronounced gap in confidence than in performance. This decrease in confidence for mutable facts suggests an intrinsic awareness of the temporal instability inherent to these facts.
Representation of Fact Mutability: Through the utilization of probe classifiers, the paper illuminates that LMs indeed encode aspects of fact mutability within their representations. This encoding allows for the differentiation of mutable from immutable facts, indicating an embedded sense of temporality within the model's knowledge base.
Updates and Fact Mutability: Mutable facts are found to be more amenable to updates than their immutable counterparts. This ease of updating mutable facts aligns with the intuitive notion that LMs recognize the changeable nature of these facts and, as such, adapt their stored representations more readily in response to new information.

Implications and Future Directions

The findings from the MuLan benchmark underscore the implicit time-awareness within LMs, challenging the prevailing notion that LMs are predominantly time-agnostic. This inherent encoding of mutability within LMs opens new avenues for enhancing their temporal reasoning capabilities and updating mechanisms.

Conclusion

By leveraging the MuLan benchmark, this paper contributes significant insights into LMs' handling of fact mutability, revealing encoded awareness of mutability in representations and a differential ease of updates for mutable facts. These insights not only deepen our understanding of LMs' current capabilities but also guide future efforts in improving LMs' temporal accuracy and flexibility in knowledge representation.