Inconsistencies in Masked Language Models (2301.00068v3)

Published 30 Dec 2022 in cs.CL and cs.AI

Abstract: Learning to predict masked tokens in a sequence has been shown to be a helpful pretraining objective for powerful LLMs such as PaLM2. After training, such masked LLMs (MLMs) can provide distributions of tokens in the masked positions in a sequence. However, this paper shows that distributions corresponding to different masking patterns can demonstrate considerable inconsistencies, i.e., they cannot be derived from a coherent joint distribution when considered together. This fundamental flaw in MLMs can lead to self-contradictory behaviors during inference. On various benchmark datasets including MMLU, MLMs can give different predictions to the same input question. From BERT-base to UL2-20B, we show that such inconsistencies exist ubiquitously in MLMs of diverse sizes and configurations. In light of our observations, we further propose an inference-time strategy for MLMs called Ensemble of Conditionals. It jointly considers a selected range of inconsistent conditionals directly produced by the MLM for the final prediction, which often leads to considerable accuracy improvement.

Authors (3)

Tom Young (9 papers)
Yang You (173 papers)
Yunan Chen (8 papers)

Citations (2)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/tomyoung903/status/1762422393103089782

https://twitter.com/chenyunan1999/status/1762451163960766961

Inconsistencies in Masked Language Models (2301.00068v3)

Summary

Related Papers

Tweets