Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multilingual sequence-to-sequence speech recognition: architecture, transfer learning, and language modeling (1810.03459v1)

Published 4 Oct 2018 in cs.CL, cs.LG, cs.SD, and eess.AS

Abstract: Sequence-to-sequence (seq2seq) approach for low-resource ASR is a relatively new direction in speech research. The approach benefits by performing model training without using lexicon and alignments. However, this poses a new problem of requiring more data compared to conventional DNN-HMM systems. In this work, we attempt to use data from 10 BABEL languages to build a multi-lingual seq2seq model as a prior model, and then port them towards 4 other BABEL languages using transfer learning approach. We also explore different architectures for improving the prior multilingual seq2seq model. The paper also discusses the effect of integrating a recurrent neural network LLM (RNNLM) with a seq2seq model during decoding. Experimental results show that the transfer learning approach from the multilingual model shows substantial gains over monolingual models across all 4 BABEL languages. Incorporating an RNNLM also brings significant improvements in terms of %WER, and achieves recognition performance comparable to the models trained with twice more training data.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Jaejin Cho (24 papers)
  2. Murali Karthick Baskar (15 papers)
  3. Ruizhi Li (9 papers)
  4. Matthew Wiesner (32 papers)
  5. Sri Harish Mallidi (7 papers)
  6. Nelson Yalta (5 papers)
  7. Shinji Watanabe (416 papers)
  8. Takaaki Hori (41 papers)
  9. Martin Karafiat (2 papers)
Citations (116)

Summary

We haven't generated a summary for this paper yet.