Papers
Topics
Authors
Recent
2000 character limit reached

Enabling Language Models to Fill in the Blanks

Published 11 May 2020 in cs.CL, cs.AI, and cs.LG | (2005.05339v2)

Abstract: We present a simple approach for text infilling, the task of predicting missing spans of text at any position in a document. While infilling could enable rich functionality especially for writing assistance tools, more attention has been devoted to language modeling---a special case of infilling where text is predicted at the end of a document. In this paper, we aim to extend the capabilities of LMs to the more general task of infilling. To this end, we train (or fine-tune) off-the-shelf LMs on sequences containing the concatenation of artificially-masked text and the text which was masked. We show that this approach, which we call infilling by language modeling, can enable LMs to infill entire sentences effectively on three different domains: short stories, scientific abstracts, and lyrics. Furthermore, we show that humans have difficulty identifying sentences infilled by our approach as machine-generated in the domain of short stories.

Citations (184)

Summary

  • The paper presents ILM, a novel method that enables language models to predict missing text spans using both preceding and subsequent contexts.
  • The methodology fine-tunes uni-directional models for text infilling, achieving perplexity scores comparable to more complex bidirectional architectures.
  • The approach offers practical benefits for content editing, historical text restoration, and creative writing, expanding the versatility of LMs.

Enabling LLMs to Fill in the Blanks

The paper "Enabling LLMs to Fill in the Blanks" by Donahue, Lee, and Liang presents a novel approach to the problem of text infilling using LMs. Text infilling involves predicting missing spans of text within a document, as opposed to just at the end of a text, as in traditional language modeling. This capability is particularly valuable for applications in writing assistance, content editing, and restoration of incomplete or damaged historical texts.

Methodology

The authors introduce a method termed "Infilling by Language Modeling" (ILM). This technique involves training or fine-tuning pre-existing LMs on sequences of text where certain spans have been masked and concatenated with the original masked text. The process enables LMs to efficiently predict missing text in various contexts. The ILM framework trains LMs using examples where the masked text is followed by its corresponding unmasked content, allowing the model to predict based on both preceding and subsequent context.

Empirical Evaluation

The research demonstrates that the ILM method is effectively able to infill text across multiple domains, including short stories, scientific abstracts, and song lyrics. The evaluation shows that humans find it challenging to distinguish between human-written and machine-generated text when the latter is produced using ILM. The text infilled by the ILM-trained models exhibits high coherence within the context it's placed.

Quantitatively, the ILM models are evaluated using perplexity, which is a measure of how well a model predicts a sample. Perplexity scores for ILM are comparable to those models that rely on both past and future contexts, establishing its efficiency. Specifically, ILM leverages bidirectional context without the need for complex architectures like BERT, which demands more computational resources for longer sequences.

Theoretical and Practical Implications

The ILM approach harnesses the simplicity and computational efficiency of uni-directional LMs while enabling them to perform tasks traditionally reserved for more complex architectures like BERT or SpanBERT. This extension broadens the utility of LMs for a variety of applications without the requirement of imposition onto their primary language modeling tasks.

From a theoretical standpoint, ILM suggests a significant potential in leveraging sequence prediction abilities of LMs for a broader range of tasks. By fine-tuning, LMs can effectively handle various unknown length predictions without directionality constraints, ultimately furthering the versatility of pre-trained models.

Future Directions

The paper indicates that the architecture-agnostic nature of ILM positions it as a robust framework for applications beyond text infilling. It also points towards the potential in interacting systems for text generation, such as writing aids or creative content generators. By incorporating ILM into co-creation systems or other interactive writing tools, there could be significant advancements in human-computer collaborative writing tasks.

In summary, "Enabling LLMs to Fill in the Blanks" extends the boundaries of current text generation capabilities, allowing for more sophisticated automation in text-editing, completion, and creative tasks while maintaining efficiency and simplicity inherent to current uni-directional LLMs. The work encourages further exploration and adoption of infilling tasks, potentially catalyzing developments in versatile AI-assisted text generation systems.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Authors (3)

Collections

Sign up for free to add this paper to one or more collections.