Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition (2209.08326v1)

Published 17 Sep 2022 in eess.AS and cs.CL

Abstract: While transformers and their variant conformers show promising performance in speech recognition, the parameterized property leads to much memory cost during training and inference. Some works use cross-layer weight-sharing to reduce the parameters of the model. However, the inevitable loss of capacity harms the model performance. To address this issue, this paper proposes a parameter-efficient conformer via sharing sparsely-gated experts. Specifically, we use sparsely-gated mixture-of-experts (MoE) to extend the capacity of a conformer block without increasing computation. Then, the parameters of the grouped conformer blocks are shared so that the number of parameters is reduced. Next, to ensure the shared blocks with the flexibility of adapting representations at different levels, we design the MoE routers and normalization individually. Moreover, we use knowledge distillation to further improve the performance. Experimental results show that the proposed model achieves competitive performance with 1/3 of the parameters of the encoder, compared with the full-parameter model.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Ye Bai (28 papers)
  2. Jie Li (553 papers)
  3. Wenjing Han (3 papers)
  4. Hao Ni (43 papers)
  5. Kaituo Xu (1 paper)
  6. Zhuo Zhang (42 papers)
  7. Cheng Yi (5 papers)
  8. Xiaorui Wang (30 papers)
Citations (1)