Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective (2310.11451v2)

Published 17 Oct 2023 in cs.CL, cs.AI, and cs.LG

Abstract: LLMs inherently encode a wealth of knowledge within their parameters through pre-training on extensive corpora. While prior research has delved into operations on these parameters to manipulate the underlying implicit knowledge (encompassing detection, editing, and merging), there remains an ambiguous understanding regarding their transferability across models with varying scales. In this paper, we seek to empirically investigate knowledge transfer from larger to smaller models through a parametric perspective. To achieve this, we employ sensitivity-based techniques to extract and align knowledge-specific parameters between different LLMs. Moreover, the LoRA module is used as the intermediary mechanism for injecting the extracted knowledge into smaller models. Evaluations across four benchmarks validate the efficacy of our proposed method. Our findings highlight the critical factors contributing to the process of parametric knowledge transfer, underscoring the transferability of model parameters across LLMs of different scales. Project website: https://maszhongming.github.io/ParaKnowTransfer.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Ming Zhong (88 papers)
  2. Chenxin An (17 papers)
  3. Weizhu Chen (128 papers)
  4. Jiawei Han (263 papers)
  5. Pengcheng He (60 papers)
Citations (7)
X Twitter Logo Streamline Icon: https://streamlinehq.com