POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training (2005.00558v2)

Published 1 May 2020 in cs.CL, cs.AI, and cs.LG

Abstract: Large-scale pre-trained LLMs, such as BERT and GPT-2, have achieved excellent performance in language representation learning and free-form text generation. However, these models cannot be directly employed to generate text under specified lexical constraints. To address this challenge, we present POINTER (PrOgressive INsertion-based TransformER), a simple yet novel insertion-based approach for hard-constrained text generation. The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner. This procedure is recursively applied until a sequence is completed. The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable. We pre-train our model with the proposed progressive insertion-based objective on a 12GB Wikipedia dataset, and fine-tune it on downstream hard-constrained generation tasks. Non-autoregressive decoding yields an empirically logarithmic time complexity during inference time. Experimental results on both News and Yelp datasets demonstrate that POINTER achieves state-of-the-art performance on constrained text generation. We released the pre-trained models and the source code to facilitate future research (https://github.com/dreasysnail/POINTER).

Authors (6)

Yizhe Zhang (127 papers)
Guoyin Wang (108 papers)
Chunyuan Li (122 papers)
Zhe Gan (135 papers)
Chris Brockett (37 papers)
Bill Dolan (45 papers)

Citations (29)

View on Semantic Scholar

Summary

Pointer: Constrained Progressive Text Generation via Insertion-based Generative Pre-training

The paper presents an innovative approach to hard-constrained text generation through a method named Pointer, which stands for PrOgressive INsertion-based TransformER. This approach specifically targets challenges faced by prevalent LLMs such as BERT and GPT-2 when tasked with generating text under stringent lexical constraints.

Core Contributions

Non-Autoregressive Model for Hard-Constrained Generation: Existing models such as BERT and GPT-2 are effective for general text generation but struggle with generating outputs that include all specified constraints. The Pointer model employs a non-autoregressive, insertion-based technique that incrementally fills in placeholders within a rough sequence to arrive at a coherent complete structure. This method ensures that all constraints are present in the output, achieving state-of-the-art results on several benchmarks, including News and Yelp datasets.
Progressive Token Insertion: Rather than generating sentences from left to right in a purely sequential manner, Pointer progresses through stages where new tokens are inserted between existing ones. This hierarchical and coarse-to-fine progression aligns more closely with human sentence construction, where key ideas precede grammatical fillers.
Empirical Logarithmic Time Complexity: The model demonstrates significant efficiency with an empirically observed logarithmic time complexity due to non-autoregressive decoding, an advantage over traditional autoregressive techniques.
Pre-training on Large-Scale Datasets: To boost performance, the Pointer model employs a large-scale pre-training phase on extensive datasets like Wikipedia, which serves to fine-tune it on downstream hard-constrained text generation tasks.
Beam Search Adaptation: To address issues related to conditional independence during parallel decoding, Pointer introduces an innovative beam search algorithm tailored for its insertion-based mechanism, enhancing overall text quality and consistency.

Methodology

The Pointer model's methodological foundation revolves around using existing insertion transformers modified to fit a progressive generation framework. The training process involves dynamic programming techniques to determine the optimal sequence of progressive insertions, taking into account token importance and efficiently scaling with corpus size.

Quantitative Results

On popular datasets, the Pointer model demonstrates superior performance against contemporary methods such as CGMH and NMSTG, as evidenced by metrics such as BLEU, NIST, METEOR, and perplexity scores. For example, evaluations on the News dataset with the pre-trained model indicate BLEU-4 scores of 3.04 against 1.58 for some baseline methods.

Theoretical Implications

The model's design leverages insights from natural language processing and formal language theory to emphasize non-autoregressive generation. The use of constraints as pivot points around which to organize generative efforts might provide a new avenue for research into more hierarchically structured neural networks.

Practical Implications

From a practical standpoint, this methodological shift holds promise for applications requiring strict adherence to lexical inclusions, such as generating summaries from meetings with predefined keywords or transforming search queries into more articulate sentences.

Future Developments

Further research could explore extending the Pointer framework to allow more dynamic constraint management, such as handling variations and inflections without losing the efficacy of constraint satisfaction. Additionally, enhancing interpretability and control in the generation process by integrating more sophisticated parsing mechanisms could unlock the potential of these techniques in broader domains, including dialogue systems and interactive AI applications.

In conclusion, this paper provides compelling evidence in favor of hierarchical and context-adaptive generative models, setting a new benchmark for hard-constrained text synthesis tasks.

PDF Markdown

Related Papers

GitHub

GitHub - dreasysnail/POINTER (112 stars)

Tweets

https://twitter.com/YizheZhangNLP/status/1306340720735784960

https://twitter.com/samarth_91/status/1332015330655395847