Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Infinite-Long Prefix in Transformer (2406.14036v2)

Published 20 Jun 2024 in cs.LG, cs.AI, and cs.CL

Abstract: Prompting and context-based fine-tuning methods, which we call Prefix Learning, have been proposed to enhance the performance of LLMs on various downstream tasks. They are empirically efficient and effective, matching the performance of full parameter fine-tuning, but the theoretical understandings are limited. In this paper, we aim to address this limitation by studying their ability from the perspective of prefix length. In particular, we provide a convergence guarantee for training an ultra-long prefix in a stylized setting using the Neural Tangent Kernel (NTK) framework. Based on this strong theoretical guarantee, we design and implement an algorithm that only needs to introduce and fine-tune a few extra trainable parameters instead of an infinite-long prefix in each layer of a transformer, and can approximate the prefix attention to a guaranteed polynomial-small error. Preliminary experimental results on vision, natural language, and math data show that our method achieves superior or competitive performance compared to existing methods like full parameters fine-tuning, P-Tuning V2, and LoRA. This demonstrates our method is promising for parameter-efficient fine-tuning. Our code can be found at \url{https://github.com/ChristianYang37/chiwun/tree/main/src/NTK-Attention}.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yingyu Liang (107 papers)
  2. Zhenmei Shi (60 papers)
  3. Zhao Song (253 papers)
  4. Chiwun Yang (14 papers)
Citations (6)
Github Logo Streamline Icon: https://streamlinehq.com