Knowledge Distillation of Black-Box Large Language Models (2401.07013v2)

Published 13 Jan 2024 in cs.CL

Abstract: Given the exceptional performance of proprietary LLMs like GPT-4, recent research has increasingly focused on boosting the capabilities of smaller models through knowledge distillation (KD) from these powerful yet black-box teachers. While leveraging the high-quality outputs of these teachers is advantageous, the inaccessibility of their internal states often limits effective knowledge transfer. To overcome this limitation, we introduce Proxy-KD, a novel method that uses a proxy model to facilitate the efficient transfer of knowledge from black-box LLMs to smaller models. Our experiments show that Proxy-KD not only enhances the performance of KD from black-box teacher models but also surpasses traditional white-box KD techniques.~This approach presents a compelling new avenue for distilling knowledge from advanced LLMs.

View on arXiv

References (28)

Authors (7)

Hongzhan Chen (6 papers)
Xiaojun Quan (52 papers)
Ming Yan (190 papers)
Ji Zhang (176 papers)
Ruijun Chen (12 papers)
Yuqi Yi (2 papers)
Chenliang Li (92 papers)

Citations (2)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Knowledge Distillation of Black-Box Large Language Models (2401.07013v2)

Summary

Related Papers