Probabilistically Masked Language Model Capable of Autoregressive Generation in Arbitrary Word Order (2004.11579v1)
Abstract: Masked LLM and autoregressive LLM are two types of LLMs. While pretrained masked LLMs such as BERT overwhelm the line of natural language understanding (NLU) tasks, autoregressive LLMs such as GPT are especially capable in natural language generation (NLG). In this paper, we propose a probabilistic masking scheme for the masked LLM, which we call probabilistically masked LLM (PMLM). We implement a specific PMLM with a uniform prior distribution on the masking ratio named u-PMLM. We prove that u-PMLM is equivalent to an autoregressive permutated LLM. One main advantage of the model is that it supports text generation in arbitrary order with surprisingly good quality, which could potentially enable new applications over traditional unidirectional generation. Besides, the pretrained u-PMLM also outperforms BERT on a set of downstream NLU tasks.