Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Enabling Language Models to Fill in the Blanks (2005.05339v2)

Published 11 May 2020 in cs.CL, cs.AI, and cs.LG

Abstract: We present a simple approach for text infilling, the task of predicting missing spans of text at any position in a document. While infilling could enable rich functionality especially for writing assistance tools, more attention has been devoted to LLMing---a special case of infilling where text is predicted at the end of a document. In this paper, we aim to extend the capabilities of LLMs (LMs) to the more general task of infilling. To this end, we train (or fine-tune) off-the-shelf LMs on sequences containing the concatenation of artificially-masked text and the text which was masked. We show that this approach, which we call infilling by LLMing, can enable LMs to infill entire sentences effectively on three different domains: short stories, scientific abstracts, and lyrics. Furthermore, we show that humans have difficulty identifying sentences infilled by our approach as machine-generated in the domain of short stories.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Chris Donahue (35 papers)
  2. Mina Lee (19 papers)
  3. Percy Liang (239 papers)
Citations (184)

Summary

Enabling LLMs to Fill in the Blanks

The paper "Enabling LLMs to Fill in the Blanks" by Donahue, Lee, and Liang presents a novel approach to the problem of text infilling using LLMs (LMs). Text infilling involves predicting missing spans of text within a document, as opposed to just at the end of a text, as in traditional LLMing. This capability is particularly valuable for applications in writing assistance, content editing, and restoration of incomplete or damaged historical texts.

Methodology

The authors introduce a method termed "Infilling by LLMing" (ILM). This technique involves training or fine-tuning pre-existing LMs on sequences of text where certain spans have been masked and concatenated with the original masked text. The process enables LMs to efficiently predict missing text in various contexts. The ILM framework trains LMs using examples where the masked text is followed by its corresponding unmasked content, allowing the model to predict based on both preceding and subsequent context.

Empirical Evaluation

The research demonstrates that the ILM method is effectively able to infill text across multiple domains, including short stories, scientific abstracts, and song lyrics. The evaluation shows that humans find it challenging to distinguish between human-written and machine-generated text when the latter is produced using ILM. The text infilled by the ILM-trained models exhibits high coherence within the context it's placed.

Quantitatively, the ILM models are evaluated using perplexity, which is a measure of how well a model predicts a sample. Perplexity scores for ILM are comparable to those models that rely on both past and future contexts, establishing its efficiency. Specifically, ILM leverages bidirectional context without the need for complex architectures like BERT, which demands more computational resources for longer sequences.

Theoretical and Practical Implications

The ILM approach harnesses the simplicity and computational efficiency of uni-directional LMs while enabling them to perform tasks traditionally reserved for more complex architectures like BERT or SpanBERT. This extension broadens the utility of LMs for a variety of applications without the requirement of imposition onto their primary LLMing tasks.

From a theoretical standpoint, ILM suggests a significant potential in leveraging sequence prediction abilities of LMs for a broader range of tasks. By fine-tuning, LMs can effectively handle various unknown length predictions without directionality constraints, ultimately furthering the versatility of pre-trained models.

Future Directions

The paper indicates that the architecture-agnostic nature of ILM positions it as a robust framework for applications beyond text infilling. It also points towards the potential in interacting systems for text generation, such as writing aids or creative content generators. By incorporating ILM into co-creation systems or other interactive writing tools, there could be significant advancements in human-computer collaborative writing tasks.

In summary, "Enabling LLMs to Fill in the Blanks" extends the boundaries of current text generation capabilities, allowing for more sophisticated automation in text-editing, completion, and creative tasks while maintaining efficiency and simplicity inherent to current uni-directional LLMs. The work encourages further exploration and adoption of infilling tasks, potentially catalyzing developments in versatile AI-assisted text generation systems.