Directed Acyclic Transformer Pre-training for High-quality Non-autoregressive Text Generation
The research presented in this paper introduces a novel approach to optimizing Non-Autoregressive (NAR) text generation through the development of the Pre-trained Directed Acyclic Transformer (PreDAT). The primary aim is to address the deficiencies in pre-training for NAR models, which have historically lagged behind pre-trained autoregressive (AR) models in more diverse text generation tasks. The paper introduces a new pre-training task and demonstrates significant performance improvements over existing models.
The authors propose a model architecture called the Directed Acyclic Transformer (DAT), which leverages a directed acyclic graph to enhance NAR generation. This approach is designed to reduce errors in token prediction by incorporating an unbiased prediction order, enabling the model to generate text in a more consistent and streamlined manner. PreDAT is trained using a Double-Source Text Infilling (DSTI) task, which is designed to improve prediction consistency and mitigate the common multi-modality problem in NAR models.
Quantitative results from the experiments conducted on five different text generation tasks are compelling. PreDAT outperforms existing pre-trained NAR models by achieving an average score improvement of 4.2 on standardized n-gram-based metrics and surpasses pre-trained AR baselines, showing an average improvement of 0.7 scores while facilitating a 17x increase in throughput. These findings indicate that PreDAT not only accelerates the generation process but also enhances overall text quality, effectively tackling the error accumulation issues seen in autoregressive models.
From a methodological standpoint, the DSTI task allows PreDAT to process simultaneously predicted sentence fragments, significantly promoting bidirectional dependencies in text generation. In practical terms, this implies that PreDAT could be well-suited for applications demanding both high-speed and high-quality text generation.
Theoretical implications of this research suggest a shift in how pre-training for NAR models can be approached, with potential applications extending to real-time text generation where latency and throughput are critical. Additionally, PreDAT’s capacity to lessen error accumulation and enhance input relevance offers new avenues for further exploration in both machine translation and other text-intensive AI applications.
Future research directions inspired by this work could explore further enhancements to DSTI and the integration of more complex alignment-based objectives. There is also potential in expanding the framework to incorporate more nuanced understanding and prediction tasks that could benefit domains like dialogue systems and content creation where textual coherence and accuracy are paramount.
In summary, this paper signifies an important step in advancing NAR text generation by introducing a robust pre-training framework that bridges the gap between speed and accuracy, providing both a theoretical and practical foundation for subsequent advancements in AI text generation methodologies.