Imputer: Sequence Modelling via Imputation and Dynamic Programming (2002.08926v2)

Published 20 Feb 2020 in eess.AS, cs.CL, cs.LG, and cs.SD

Abstract: This paper presents the Imputer, a neural sequence model that generates output sequences iteratively via imputations. The Imputer is an iterative generative model, requiring only a constant number of generation steps independent of the number of input or output tokens. The Imputer can be trained to approximately marginalize over all possible alignments between the input and output sequences, and all possible generation orders. We present a tractable dynamic programming training algorithm, which yields a lower bound on the log marginal likelihood. When applied to end-to-end speech recognition, the Imputer outperforms prior non-autoregressive models and achieves competitive results to autoregressive models. On LibriSpeech test-other, the Imputer achieves 11.1 WER, outperforming CTC at 13.0 WER and seq2seq at 12.5 WER.

Citations (112)

View on Semantic Scholar

Summary

Imputer: Sequence Modelling via Imputation and Dynamic Programming

The paper introduces a novel sequence modeling approach called the "Imputer," which leverages imputation and dynamic programming for iterative sequence generation. The Imputer framework combines characteristics of autoregressive and non-autoregressive models, aiming to optimize the trade-off between generation speed and modeling conditional dependencies in sequence prediction tasks.

Key Concepts and Methodology

The Imputer is characterized by several distinctive features:

Iterative Generative Model: Unlike traditional autoregressive models that operate in a left-to-right manner, the Imputer generates sequences using a constant number of iterations regardless of sequence length. This constant complexity generation enables efficient model operation without compromising sequence dependency modeling.
Dynamic Programming Training Algorithm: The paper proposes a dynamic programming approach to training, which allows the model to approximately marginalize over all possible alignments and generation orders. This approach provides a lower bound on the log marginal likelihood, offering a tractable and efficient optimization pathway.
Monotonic Latent Alignment: Imputer leverages inherent sequence monotonic alignments, such as those found in speech recognition tasks, to streamline its architecture. The model supersedes traditional encoder-decoder frameworks which involve intricate cross-attention mechanisms unnecessary for sequence types possessing natural monotonic patterns.
Simplified Architecture: The Imputer architecture combines convolutional layers with self-attention layers. This integration is designed to process input features efficiently, avoiding convoluted mechanisms associated with high-complexity models, thereby maintaining effectiveness across diverse sequence contexts.

Experimental Results and Performance

The empirical analysis reveals that the Imputer demonstrates competitive performance in end-to-end speech recognition tasks, outperforming prior non-autoregressive models and closely aligning with autoregressive model outcomes:

On the LibriSpeech test set, the Imputer achieved a Word Error Rate (WER) of 11.1, overtaking non-autoregressive CTC models (13.0 WER) and matching autoregressive seq2seq models (12.5 WER).
Wall Street Journal benchmarks further demonstrated the Imputer's efficacy, with the dynamic programming variant outperforming imitation learning strategies, indicating optimization advantages attributable to dynamic programming architectures.

Implications and Future Directions

The Imputer offers a practicable solution to sequence modeling by balancing efficiency and dependency modeling, particularly for tasks with inherent monotonic sequence alignment such as speech recognition. Its constant-step inference approach suggests potential applications in computational environments where resource limitations necessitate rapid processing without sacrificing output quality.

Practically, the Imputer could be expanded to address problems in machine translation and image captioning by tailoring its sequence modeling advantages to non-monotonic sequence domains.

Theoretically, exploring models that can simultaneously optimize autoregressive and non-autoregressive characteristics remains promising. Enhancing the Imputer's adaptability could spearhead advancements in sequence generation efficiency and efficacy, potentially influencing broader AI developments.

Overall, the Imputer represents a significant stride in sequence modeling, offering nuanced balance between complexity and dependency modeling. Future work may focus on extending its methodologies to further diversify model adaptability and integration across AI fields.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Related Papers

Authors (5)

YouTube

Show All Videos