Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Directed Acyclic Transformer Pre-training for High-quality Non-autoregressive Text Generation (2304.11791v1)

Published 24 Apr 2023 in cs.CL

Abstract: Non-AutoRegressive (NAR) text generation models have drawn much attention because of their significantly faster decoding speed and good generation quality in machine translation. However, in a wider range of text generation tasks, existing NAR models lack proper pre-training, making them still far behind the pre-trained autoregressive models. In this paper, we propose Pre-trained Directed Acyclic Transformer (PreDAT) and a novel pre-training task to promote prediction consistency in NAR generation. Experiments on five text generation tasks show that our PreDAT remarkably outperforms existing pre-trained NAR models (+4.2 scores on average) and even achieves better results than pre-trained autoregressive baselines in n-gram-based metrics, along with 17 times speedup in throughput. Further analysis shows that PreDAT benefits from the unbiased prediction order that alleviates the error accumulation problem in autoregressive generation, which provides new insights into the advantages of NAR generation.

Directed Acyclic Transformer Pre-training for High-quality Non-autoregressive Text Generation

The research presented in this paper introduces a novel approach to optimizing Non-Autoregressive (NAR) text generation through the development of the Pre-trained Directed Acyclic Transformer (PreDAT). The primary aim is to address the deficiencies in pre-training for NAR models, which have historically lagged behind pre-trained autoregressive (AR) models in more diverse text generation tasks. The paper introduces a new pre-training task and demonstrates significant performance improvements over existing models.

The authors propose a model architecture called the Directed Acyclic Transformer (DAT), which leverages a directed acyclic graph to enhance NAR generation. This approach is designed to reduce errors in token prediction by incorporating an unbiased prediction order, enabling the model to generate text in a more consistent and streamlined manner. PreDAT is trained using a Double-Source Text Infilling (DSTI) task, which is designed to improve prediction consistency and mitigate the common multi-modality problem in NAR models.

Quantitative results from the experiments conducted on five different text generation tasks are compelling. PreDAT outperforms existing pre-trained NAR models by achieving an average score improvement of 4.2 on standardized n-gram-based metrics and surpasses pre-trained AR baselines, showing an average improvement of 0.7 scores while facilitating a 17x increase in throughput. These findings indicate that PreDAT not only accelerates the generation process but also enhances overall text quality, effectively tackling the error accumulation issues seen in autoregressive models.

From a methodological standpoint, the DSTI task allows PreDAT to process simultaneously predicted sentence fragments, significantly promoting bidirectional dependencies in text generation. In practical terms, this implies that PreDAT could be well-suited for applications demanding both high-speed and high-quality text generation.

Theoretical implications of this research suggest a shift in how pre-training for NAR models can be approached, with potential applications extending to real-time text generation where latency and throughput are critical. Additionally, PreDAT’s capacity to lessen error accumulation and enhance input relevance offers new avenues for further exploration in both machine translation and other text-intensive AI applications.

Future research directions inspired by this work could explore further enhancements to DSTI and the integration of more complex alignment-based objectives. There is also potential in expanding the framework to incorporate more nuanced understanding and prediction tasks that could benefit domains like dialogue systems and content creation where textual coherence and accuracy are paramount.

In summary, this paper signifies an important step in advancing NAR text generation by introducing a robust pre-training framework that bridges the gap between speed and accuracy, providing both a theoretical and practical foundation for subsequent advancements in AI text generation methodologies.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Fei Huang (409 papers)
  2. Pei Ke (38 papers)
  3. Minlie Huang (226 papers)
Citations (6)