SKILL: Similarity-aware Knowledge distILLation for Speech Self-Supervised Learning (2402.16830v1)

Published 26 Feb 2024 in eess.AS, cs.CL, cs.LG, and cs.SD

Abstract: Self-supervised learning (SSL) has achieved remarkable success across various speech-processing tasks. To enhance its efficiency, previous works often leverage the use of compression techniques. A notable recent attempt is DPHuBERT, which applies joint knowledge distillation (KD) and structured pruning to learn a significantly smaller SSL model. In this paper, we contribute to this research domain by introducing SKILL, a novel method that conducts distillation across groups of layers instead of distilling individual arbitrarily selected layers within the teacher network. The identification of the layers to distill is achieved through a hierarchical clustering procedure applied to layer similarity measures. Extensive experiments demonstrate that our distilled version of WavLM Base+ not only outperforms DPHuBERT but also achieves state-of-the-art results in the 30M parameters model class across several SUPERB tasks.

References (39)

Citations (2)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/mirco_ravanelli/status/1779495040357695853

https://twitter.com/dippatel1994/status/1762466095233962310

SKILL: Similarity-aware Knowledge distILLation for Speech Self-Supervised Learning (2402.16830v1)

Summary

Related Papers

Tweets