Jointly Learning to Align and Convert Graphemes to Phonemes with Neural Attention Models

Published 20 Oct 2016 in cs.CL and cs.AI | (1610.06540v1)

Abstract: We propose an attention-enabled encoder-decoder model for the problem of grapheme-to-phoneme conversion. Most previous work has tackled the problem via joint sequence models that require explicit alignments for training. In contrast, the attention-enabled encoder-decoder model allows for jointly learning to align and convert characters to phonemes. We explore different types of attention models, including global and local attention, and our best models achieve state-of-the-art results on three standard data sets (CMUDict, Pronlex, and NetTalk).