LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning (2506.00772v1)

Published 1 Jun 2025 in cs.LG, cs.AI, and cs.CL

Abstract: Recent studies have shown that supervised fine-tuning of LLMs on a small number of high-quality datasets can yield strong reasoning capabilities. However, full fine-tuning (Full FT), while powerful, is computationally expensive and susceptible to overfitting and catastrophic forgetting, particularly when data is limited. Sparse fine-tuning, which previously achieved notable success by updating only a small subset of model parameters, offers a promising trade-off between efficiency and effectiveness. Yet, it has lagged behind in the LLM era due to the difficulty of identifying parameters truly critical for reasoning. In this work, we state that weights with the largest magnitude after low-rank approximation are critical weights for fine-tuning, which we call Principal Weights. Surprisingly, while magnitude-based sparse fine-tuning performs poorly as a baseline on LLM fine-tuning, it becomes highly effective after rank reduction. These insights motivate our method: Low-rank Informed Sparse Fine-Tuning (LIFT). LIFT only updates the top 5% Principal Weights throughout training and consistently achieves better performance on reasoning tasks than Full FT, while maintaining memory efficiency on par with popular parameter-efficient fine-tuning methods. In addition to strong performance on target domains such as arithmetic reasoning, LIFT also retains up to 20% more source-domain knowledge, compared to Full FT and LoRA. Our code is available at: https://github.com/zihanghliu/LIFT.

Collections

Sign up for free to add this paper to one or more collections.

Sign Up

Summary

Analyzing Principal Weights in Sparse Fine-Tuning for LLMs

The paper "LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning" proposes an innovative approach to fine-tuning LLMs by identifying and leveraging 'Principal Weights'. This method, termed Low-rank Informed Sparse Fine-Tuning (LIFT), strategically utilizes low-rank approximations to enhance reasoning capabilities while maintaining computational efficiency.

Context and Problem Definition

Recent advancements have demonstrated the ability of LLMs to achieve robust reasoning capabilities through supervised fine-tuning (SFT). However, the traditional Full Fine-Tuning (Full FT) approach presents significant challenges, including high computational cost, overfitting, and catastrophic forgetting, especially with limited datasets. Although Sparse Fine-Tuning (Sparse FT) offers a solution by updating a fraction of parameters, it struggles to pinpoint the parameters critical for reasoning, rendering it less effective and efficient compared to its low-rank adaptations in LLMs.

Key Proposition: Principal Weights

The paper introduces a novel insight into Sparse FT, contending that weights with the largest magnitude following low-rank approximation—designated as Principal Weights—are pivotal for fine-tuning. Unlike the baseline magnitude-based Sparse FT, which underperforms, incorporating low-rank approximation remarkably boosts its effectiveness. LIFT operationalizes this insight by:

Performing Singular Value Decomposition (SVD) on the weight matrix to achieve a low-rank approximation.
Selecting the subset of parameters (top 5% by magnitude) from this approximation for fine-tuning, termed as Principal Weights.
Demonstrating that this focused parameter update surpasses Full FT in reasoning tasks while saving significant memory—reducing optimizer state memory from 27GB to 1.3GB in LLMs, exemplified by the LLaMA-2-7B model.

Empirical Validation and Results

The paper presents extensive empirical evaluations to validate LIFT's efficacy. Key findings include:

Task Performance: LIFT consistently outperforms state-of-the-art parameter-efficient fine-tuning (PEFT) methods and Full FT across tasks like arithmetic reasoning and commonsense reasoning. Notably, it retains up to 20% more source-domain knowledge than both Full FT and low-rank adaptation methods like LoRA.
Memory Efficiency: The method significantly conserves memory resources, crucial for adapting modern LLMs while using sparse updates that focus only on essential components.
Generalization: A critical advantage of LIFT is its ability to balance learning and forgetting. By focusing on Principal Weights, it achieves strong performance in target domains while retaining substantial pre-training knowledge.
Weight Update Dynamics: LIFT involves larger and more effective weight updates compared to other methods, effectively altering the principal eigenspace of the model, thereby facilitating better adaptation to fine-tuning tasks.

Theoretical Implications and Future Directions

LIFT's insights resonate with recent findings on the latent reasoning capabilities inherent in base models. By focusing on Principal Weights, it robustly aligns with observations that significant reasoning capacity lies within specific, influential parameters. These insights open new avenues for further exploration:

Adaptive Learning Algorithms: Future work could explore using LIFT within reinforcement learning-based fine-tuning, potentially enhancing reasoning capacity under memory constraints.
Further Eigenspace Exploration: Investigating LIFT's impact on eigenspace and full spectrum could yield deeper understanding of LLM adaptations and generalizability.
Structured Sparse Fine-Tuning: The potential for structured sparsity within LIFT—e.g., block sparsity—offers prospects for more refined and domain-adaptive fine-tuning strategies.

In conclusion, LIFT presents a compelling framework for improving LLM fine-tuning. By identifying and leveraging Principal Weights, it successfully combines efficiency with efficacy, offering a robust alternative to traditional and contemporary fine-tuning methodologies. The paper's contributions suggest promising directions for both practical applications and theoretical explorations in the domain of LLM fine-tuning.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Related Papers

Authors (8)

GitHub

GitHub - zihanghliu/LIFT: [ICML 2025] LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning

Tweets

https://twitter.com/Shiwei_Liu66/status/1929859730203148466

https://twitter.com/fly51fly/status/1931835540103680321

https://twitter.com/DanielCardena/status/1929893188619382951

https://twitter.com/HenryLiu0820/status/1944892861507625077