Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Transformer with Bidirectional Decoder for Speech Recognition (2008.04481v1)

Published 11 Aug 2020 in eess.AS, cs.CL, cs.LG, and cs.SD

Abstract: Attention-based models have made tremendous progress on end-to-end automatic speech recognition(ASR) recently. However, the conventional transformer-based approaches usually generate the sequence results token by token from left to right, leaving the right-to-left contexts unexploited. In this work, we introduce a bidirectional speech transformer to utilize the different directional contexts simultaneously. Specifically, the outputs of our proposed transformer include a left-to-right target, and a right-to-left target. In inference stage, we use the introduced bidirectional beam search method, which can not only generate left-to-right candidates but also generate right-to-left candidates, and determine the best hypothesis by the score. To demonstrate our proposed speech transformer with a bidirectional decoder(STBD), we conduct extensive experiments on the AISHELL-1 dataset. The results of experiments show that STBD achieves a 3.6\% relative CER reduction(CERR) over the unidirectional speech transformer baseline. Besides, the strongest model in this paper called STBD-Big can achieve 6.64\% CER on the test set, without LLM rescoring and any extra data augmentation strategies.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Xi Chen (1036 papers)
  2. Songyang Zhang (116 papers)
  3. Dandan Song (12 papers)
  4. Peng Ouyang (3 papers)
  5. Shouyi Yin (15 papers)
Citations (13)

Summary

We haven't generated a summary for this paper yet.