In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models (2212.10670v1)

Published 20 Dec 2022 in cs.CL and cs.LG

Abstract: Given the success with in-context learning of large pre-trained LLMs, we introduce in-context learning distillation to transfer in-context few-shot learning ability from large models to smaller models. We propose to combine in-context learning objectives with LLMing objectives to distill both the ability to read in-context examples and task knowledge to the smaller models. We perform in-context learning distillation under two different few-shot learning paradigms: Meta In-context Tuning (Meta-ICT) and Multitask In-context Tuning (Multitask-ICT). Multitask-ICT performs better on multitask few-shot learning but also requires more computation than Meta-ICT. Our method shows consistent improvements for both Meta-ICT and Multitask-ICT on two benchmarks: LAMA and CrossFit. Our extensive experiments and analysis reveal that in-context learning objectives and LLMing objectives are complementary under the Multitask-ICT paradigm. In-context learning objectives achieve the best performance when combined with LLMing objectives.

Authors (4)

Yukun Huang (39 papers)
Yanda Chen (13 papers)
Zhou Yu (206 papers)
Kathleen McKeown (85 papers)

Citations (25)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models (2212.10670v1)

Summary

Related Papers