Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Selecting Large Language Model to Fine-tune via Rectified Scaling Law (2402.02314v3)

Published 4 Feb 2024 in cs.LG, cs.AI, and cs.CL

Abstract: The ever-growing ecosystem of LLMs has posed a challenge in selecting the most appropriate pre-trained model to fine-tune amidst a sea of options. Given constrained resources, fine-tuning all models and making selections afterward is unrealistic. In this work, we formulate this resource-constrained selection task into predicting fine-tuning performance and illustrate its natural connection with Scaling Law. Unlike pre-training, we find that the fine-tuning scaling curve includes not just the well-known "power phase" but also the previously unobserved "pre-power phase". We also explain why existing Scaling Law fails to capture this phase transition phenomenon both theoretically and empirically. To address this, we introduce the concept of "pre-learned data size" into our Rectified Scaling Law, which overcomes theoretical limitations and fits experimental results much better. By leveraging our law, we propose a novel LLM selection algorithm that selects the near-optimal model with hundreds of times less resource consumption, while other methods may provide negatively correlated selection. The project page is available at rectified-scaling-law.github.io.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Haowei Lin (21 papers)
  2. Baizhou Huang (8 papers)
  3. Haotian Ye (39 papers)
  4. Qinyu Chen (21 papers)
  5. Zihao Wang (216 papers)
  6. Sujian Li (83 papers)
  7. Jianzhu Ma (48 papers)
  8. Xiaojun Wan (99 papers)
  9. James Zou (232 papers)
  10. Yitao Liang (53 papers)
Citations (12)

Summary

We haven't generated a summary for this paper yet.