Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Transformer in action: a comparative study of transformer-based acoustic models for large scale speech recognition applications (2010.14665v2)

Published 27 Oct 2020 in cs.CL and cs.SD

Abstract: In this paper, we summarize the application of transformer and its streamable variant, Emformer based acoustic model for large scale speech recognition applications. We compare the transformer based acoustic models with their LSTM counterparts on industrial scale tasks. Specifically, we compare Emformer with latency-controlled BLSTM (LCBLSTM) on medium latency tasks and LSTM on low latency tasks. On a low latency voice assistant task, Emformer gets 24% to 26% relative word error rate reductions (WERRs). For medium latency scenarios, comparing with LCBLSTM with similar model size and latency, Emformer gets significant WERR across four languages in video captioning datasets with 2-3 times inference real-time factors reduction.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yongqiang Wang (92 papers)
  2. Yangyang Shi (53 papers)
  3. Frank Zhang (22 papers)
  4. Chunyang Wu (24 papers)
  5. Julian Chan (11 papers)
  6. Ching-Feng Yeh (22 papers)
  7. Alex Xiao (10 papers)
Citations (18)