Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Knowledge Distillation from Non-streaming to Streaming ASR Encoder using Auxiliary Non-streaming Layer (2308.16415v1)

Published 31 Aug 2023 in cs.CL and eess.AS

Abstract: Streaming automatic speech recognition (ASR) models are restricted from accessing future context, which results in worse performance compared to the non-streaming models. To improve the performance of streaming ASR, knowledge distillation (KD) from the non-streaming to streaming model has been studied, mainly focusing on aligning the output token probabilities. In this paper, we propose a layer-to-layer KD from the teacher encoder to the student encoder. To ensure that features are extracted using the same context, we insert auxiliary non-streaming branches to the student and perform KD from the non-streaming teacher layer to the non-streaming auxiliary layer. We design a special KD loss that leverages the autoregressive predictive coding (APC) mechanism to encourage the streaming model to predict unseen future contexts. Experimental results show that the proposed method can significantly reduce the word error rate compared to previous token probability distillation methods.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Kyuhong Shim (26 papers)
  2. Jinkyu Lee (14 papers)
  3. Simyung Chang (29 papers)
  4. Kyuwoong Hwang (9 papers)
Citations (2)