FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow (1909.02480v3)

Published 5 Sep 2019 in cs.CL and cs.LG

Abstract: Most sequence-to-sequence (seq2seq) models are autoregressive; they generate each token by conditioning on previously generated tokens. In contrast, non-autoregressive seq2seq models generate all tokens in one pass, which leads to increased efficiency through parallel processing on hardware such as GPUs. However, directly modeling the joint distribution of all tokens simultaneously is challenging, and even with increasingly complex model structures accuracy lags significantly behind autoregressive models. In this paper, we propose a simple, efficient, and effective model for non-autoregressive sequence generation using latent variable models. Specifically, we turn to generative flow, an elegant technique to model complex distributions using neural networks, and design several layers of flow tailored for modeling the conditional density of sequential latent variables. We evaluate this model on three neural machine translation (NMT) benchmark datasets, achieving comparable performance with state-of-the-art non-autoregressive NMT models and almost constant decoding time w.r.t the sequence length.

Authors (5)

Xuezhe Ma (50 papers)
Chunting Zhou (36 papers)
Xian Li (116 papers)
Graham Neubig (342 papers)
Eduard Hovy (115 papers)

Citations (186)

View on Semantic Scholar

Summary

The paper introduces FlowSeq, a non-autoregressive model integrating generative flows with Transformer architectures to efficiently generate high-quality sequences.
It models token dependencies using invertible transformations, resulting in significant decoding speed improvements over traditional autoregressive methods.
Experimental results on NMT benchmarks demonstrate increased BLEU scores and performance comparable to state-of-the-art models.

Overview of FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow

The paper "FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow" by Xuezhe Ma et al. presents a novel approach to improve non-autoregressive sequence generation models. Traditional sequence-to-sequence (seq2seq) models often employ an autoregressive approach, which generates each token by conditioning on previously generated tokens. While this strategy typically results in high-quality sequence generation, it is inefficient compared to non-autoregressive models, which can generate all tokens simultaneously, exploiting parallel processing capabilities of modern hardware.

Contribution and Methodology

The authors introduce FlowSeq, a non-autoregressive model for sequence generation that utilizes generative flow to tackle the complexities associated with directly modeling the joint distribution of all sequence tokens. The generative flow approach, an extension of normalizing flows, facilitates modeling complex distributions by applying a chain of invertible transformations, effectively turning simple Gaussian distributions into complex priors over latent variables.

FlowSeq comprises a flow-based structure designed to capture rich dependencies between sequence elements through expressively defined latent variables. By integrating generative flow with Transformer's encoder-decoder architecture, the authors propose a model that achieves efficient parallel decoding while maintaining competitive performance to autoregressive models. The FlowSeq model integrates neural architectures capable of capturing interactions across temporal and feature dimensions to further enhance its expressiveness.

Evaluation and Results

The paper evaluates FlowSeq on three benchmark datasets in neural machine translation (NMT): WMT2014 (DE-EN), WMT2016 (RO-EN), and IWSLT2014 (DE-EN). The experiments demonstrate that FlowSeq provides comparable performance with state-of-the-art non-autoregressive models, with notable efficiency in decoding times relative to sequence length. Despite the higher complexity of properly modeling conditional dependencies without autoregression, FlowSeq achieves considerable improvements, as evidenced by a marked increase in BLEU scores compared to baseline non-autoregressive models.

Implications and Future Directions

From a practical perspective, FlowSeq provides an efficient alternative for sequence generation tasks, enabling significant reductions in computational time when decoding long sequences—an attractive property given the constraints of real-world applications. The model’s success in bypassing the autoregressive constraints opens pathways for further developments in fast, scalable sequence tasks, extending beyond neural machine translation to other domains such as speech and image synthesis.

Theoretically, FlowSeq's introduction of generative flows into the sequence generation highlights the potential of normalizing flow methodologies in addressing sequence modeling challenges. Future work could explore enhancing the robustness of the model against multi-modality issues, as well as integrating FlowSeq with iterative refinement strategies, such as masked LLMs, to further enhance sequence quality. Moreover, investigating the interpretability of latent spaces within FlowSeq could provide deeper insights into their role and potential applications, including controllable sequence generation.

In conclusion, while FlowSeq does not claim to surpass state-of-the-art autoregressive models in every metric, its balance of performance and efficiency solidifies its place as a valuable advancement in the field of non-autoregressive sequence generation.

PDF Markdown

Related Papers

GitHub

GitHub - XuezheMax/flowseq: Generative Flow based Sequence-to-Sequence Toolkit written in Python. (245 stars)