Viterbi Decoding of Directed Acyclic Transformer for Non-Autoregressive Machine Translation (2210.05193v2)
Abstract: Non-autoregressive models achieve significant decoding speedup in neural machine translation but lack the ability to capture sequential dependency. Directed Acyclic Transformer (DA-Transformer) was recently proposed to model sequential dependency with a directed acyclic graph. Consequently, it has to apply a sequential decision process at inference time, which harms the global translation accuracy. In this paper, we present a Viterbi decoding framework for DA-Transformer, which guarantees to find the joint optimal solution for the translation and decoding path under any length constraint. Experimental results demonstrate that our approach consistently improves the performance of DA-Transformer while maintaining a similar decoding speedup.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run paper prompts using GPT-5.