Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Knowledge distillation from language model to acoustic model: a hierarchical multi-task learning approach (2110.10429v1)

Published 20 Oct 2021 in cs.LG, cs.CL, cs.SD, and eess.AS

Abstract: The remarkable performance of the pre-trained LLM (LM) using self-supervised learning has led to a major paradigm shift in the study of natural language processing. In line with these changes, leveraging the performance of speech recognition systems with massive deep learning-based LMs is a major topic of speech recognition research. Among the various methods of applying LMs to speech recognition systems, in this paper, we focus on a cross-modal knowledge distillation method that transfers knowledge between two types of deep neural networks with different modalities. We propose an acoustic model structure with multiple auxiliary output layers for cross-modal distillation and demonstrate that the proposed method effectively compensates for the shortcomings of the existing label-interpolation-based distillation method. In addition, we extend the proposed method to a hierarchical distillation method using LMs trained in different units (senones, monophones, and subwords) and reveal the effectiveness of the hierarchical distillation method through an ablation study.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Mun-Hak Lee (1 paper)
  2. Joon-Hyuk Chang (11 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.