Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improved Multi-Stage Training of Online Attention-based Encoder-Decoder Models (1912.12384v1)

Published 28 Dec 2019 in eess.AS, cs.LG, cs.SD, eess.SP, and stat.ML

Abstract: In this paper, we propose a refined multi-stage multi-task training strategy to improve the performance of online attention-based encoder-decoder (AED) models. A three-stage training based on three levels of architectural granularity namely, character encoder, byte pair encoding (BPE) based encoder, and attention decoder, is proposed. Also, multi-task learning based on two-levels of linguistic granularity namely, character and BPE, is used. We explore different pre-training strategies for the encoders including transfer learning from a bidirectional encoder. Our encoder-decoder models with online attention show 35% and 10% relative improvement over their baselines for smaller and bigger models, respectively. Our models achieve a word error rate (WER) of 5.04% and 4.48% on the Librispeech test-clean data for the smaller and bigger models respectively after fusion with long short-term memory (LSTM) based external LLM (LM).

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Abhinav Garg (11 papers)
  2. Dhananjaya Gowda (16 papers)
  3. Ankur Kumar (16 papers)
  4. Kwangyoun Kim (18 papers)
  5. Mehul Kumar (7 papers)
  6. Chanwoo Kim (68 papers)
Citations (15)

Summary

We haven't generated a summary for this paper yet.