Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models (2207.06867v2)

Published 14 Jul 2022 in cs.CL, cs.SD, and eess.AS

Abstract: Self-supervised learning (SSL) is seen as a very promising approach with high performance for several speech downstream tasks. Since the parameters of SSL models are generally so large that training and inference require a lot of memory and computational cost, it is desirable to produce compact SSL models without a significant performance degradation by applying compression methods such as knowledge distillation (KD). Although the KD approach is able to shrink the depth and/or width of SSL model structures, there has been little research on how varying the depth and width impacts the internal representation of the small-footprint model. This paper provides an empirical study that addresses the question. We investigate the performance on SUPERB while varying the structure and KD methods so as to keep the number of parameters constant; this allows us to analyze the contribution of the representation introduced by varying the model architecture. Experiments demonstrate that a certain depth is essential for solving content-oriented tasks (e.g. automatic speech recognition) accurately, whereas a certain width is necessary for achieving high performance on several speaker-oriented tasks (e.g. speaker identification). Based on these observations, we identify, for SUPERB, a more compressed model with better performance than previous studies.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Takanori Ashihara (28 papers)
  2. Takafumi Moriya (30 papers)
  3. Kohei Matsuura (26 papers)
  4. Tomohiro Tanaka (37 papers)
Citations (26)

Summary

We haven't generated a summary for this paper yet.