Embedding Compression for Teacher-to-Student Knowledge Transfer (2402.06761v1)

Published 9 Feb 2024 in cs.LG

Abstract: Common knowledge distillation methods require the teacher model and the student model to be trained on the same task. However, the usage of embeddings as teachers has also been proposed for different source tasks and target tasks. Prior work that uses embeddings as teachers ignores the fact that the teacher embeddings are likely to contain irrelevant knowledge for the target task. To address this problem, we propose to use an embedding compression module with a trainable teacher transformation to obtain a compact teacher embedding. Results show that adding the embedding compression module improves the classification performance, especially for unsupervised teacher embeddings. Moreover, student models trained with the guidance of embeddings show stronger generalizability.

Authors (2)

Yiwei Ding (13 papers)
Alexander Lerch (43 papers)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Embedding Compression for Teacher-to-Student Knowledge Transfer (2402.06761v1)

Summary

Related Papers

Tweets