Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks (2305.11073v1)

Published 18 May 2023 in cs.CL, cs.SD, and eess.AS

Abstract: Conformer, a convolution-augmented Transformer variant, has become the de facto encoder architecture for speech processing due to its superior performance in various tasks, including automatic speech recognition (ASR), speech translation (ST) and spoken language understanding (SLU). Recently, a new encoder called E-Branchformer has outperformed Conformer in the LibriSpeech ASR benchmark, making it promising for more general speech applications. This work compares E-Branchformer and Conformer through extensive experiments using different types of end-to-end sequence-to-sequence models. Results demonstrate that E-Branchformer achieves comparable or better performance than Conformer in almost all evaluation sets across 15 ASR, 2 ST, and 3 SLU benchmarks, while being more stable during training. We will release our training configurations and pre-trained models for reproducibility, which can benefit the speech community.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Yifan Peng (147 papers)
  2. Kwangyoun Kim (18 papers)
  3. Felix Wu (30 papers)
  4. Brian Yan (40 papers)
  5. Siddhant Arora (50 papers)
  6. William Chen (49 papers)
  7. Jiyang Tang (4 papers)
  8. Suwon Shon (31 papers)
  9. Prashant Sridhar (10 papers)
  10. Shinji Watanabe (416 papers)
Citations (16)

Summary

We haven't generated a summary for this paper yet.