Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Transformer based Grapheme-to-Phoneme Conversion (2004.06338v2)

Published 14 Apr 2020 in eess.AS, cs.CL, cs.LG, and cs.SD

Abstract: Attention mechanism is one of the most successful techniques in deep learning based NLP. The transformer network architecture is completely based on attention mechanisms, and it outperforms sequence-to-sequence models in neural machine translation without recurrent and convolutional layers. Grapheme-to-phoneme (G2P) conversion is a task of converting letters (grapheme sequence) to their pronunciations (phoneme sequence). It plays a significant role in text-to-speech (TTS) and automatic speech recognition (ASR) systems. In this paper, we investigate the application of transformer architecture to G2P conversion and compare its performance with recurrent and convolutional neural network based approaches. Phoneme and word error rates are evaluated on the CMUDict dataset for US English and the NetTalk dataset. The results show that transformer based G2P outperforms the convolutional-based approach in terms of word error rate and our results significantly exceeded previous recurrent approaches (without attention) regarding word and phoneme error rates on both datasets. Furthermore, the size of the proposed model is much smaller than the size of the previous approaches.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Sevinj Yolchuyeva (2 papers)
  2. Géza Németh (8 papers)
  3. Bálint Gyires-Tóth (15 papers)
Citations (57)

Summary

We haven't generated a summary for this paper yet.