Papers

Topics

Authors

Recent

View all

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 75 tok/s

Gemini 2.5 Pro 46 tok/s Pro

GPT-5 Medium 26 tok/s Pro

GPT-5 High 27 tok/s Pro

GPT-4o 104 tok/s Pro

Kimi K2 170 tok/s Pro

GPT OSS 120B 468 tok/s Pro

Claude Sonnet 4 37 tok/s Pro

2000 character limit reached

Learning Dynamics of LLM Finetuning (2407.10490v4)

Published 15 Jul 2024 in cs.LG, cs.AI, and cs.CL

Abstract: Learning dynamics, which describes how the learning of specific training examples influences the model's predictions on other examples, gives us a powerful tool for understanding the behavior of deep learning systems. We study the learning dynamics of LLMs during different types of finetuning, by analyzing the step-wise decomposition of how influence accumulates among different potential responses. Our framework allows a uniform interpretation of many interesting observations about the training of popular algorithms for both instruction tuning and preference tuning. In particular, we propose a hypothetical explanation of why specific types of hallucination are strengthened after finetuning, e.g., the model might use phrases or facts in the response for question B to answer question A, or the model might keep repeating similar simple phrases when generating responses. We also extend our framework and highlight a unique "squeezing effect" to explain a previously observed phenomenon in off-policy direct preference optimization (DPO), where running DPO for too long makes even the desired outputs less likely. This framework also provides insights into where the benefits of on-policy DPO and other variants come from. The analysis not only provides a novel perspective of understanding LLM's finetuning but also inspires a simple, effective method to improve alignment performance.

Citations (2)

View on Semantic Scholar

Summary

The paper identifies key learning dynamics through gradient decomposition to reveal stepwise influences of training examples on model predictions.
It demonstrates that both supervised and preference-based finetuning shift probabilities, with experiments showing more stable updates and improved output alignment.
The findings offer actionable insights for mitigating issues like hallucinations and confidence decay, paving the way for refined finetuning algorithms.

Learning Dynamics of LLM Finetuning

The paper "Learning Dynamics of LLM Finetuning" by Yi Ren and Danica J. Sutherland offers an in-depth exploration of the learning dynamics during the finetuning of LLMs. The authors utilize a gradient-based framework to analyze step-wise decompositions and accumulated influences among responses, providing a uniform interpretation across various finetuning algorithms like supervised finetuning (SFT), self-play finetuning (SPIN), and direct preference optimization (DPO).

Overview of Learning Dynamics

Learning dynamics describe how specific training examples influence a model’s predictions on other examples. The authors focus on how changes in model parameters, $\Delta\theta$ , affect the predictions, $\Delta f_\theta$ , via gradient descent. For LLMs, this approach allows researchers to understand phenomena during finetuning, helping to propose novel algorithms based on these insights.

Application to Supervised Finetuning (SFT)

In SFT, the typical loss function is the negative log-likelihood (NLL) of given completions conditioned on the prompt. The analysis starts by representing the model’s learning dynamics through a gradient decomposition, which considers the influence of $\mathcal{K}^t$ , the empirical neural tangent kernel, representing the model-specific similarity between different input samples, and $\mathcal{G}^t$ , describing the direction and energy of the model’s adaptation.

Semantic Analysis through Probing Dataset

To tackle the vast response space, the authors divide it into three subspaces:

$\mathcal{Y}_\text{IF}$ : Reasonable responses following the instruction.
$\mathcal{Y}_\text{non-IF}$ : Irrelevant but human language responses.
$\mathcal{Y}_\text{non-hum}$ : Non-human language sequences.

By tracking seven typical responses, the paper demonstrates that responses in $\mathcal{Y}_\text{IF}$ generally see an increase in predicted probability during early stages of finetuning, while responses in $\mathcal{Y}_\text{non-hum}$ consistently decrease.

Learning Dynamics of Preference Finetuning

Preference finetuning aims to align LLM outputs with human preferences. This paper specifically examines RL-free methods like DPO, where the loss function involves a comparison between preferred ( $y_u^+$ ) and less preferred ( $y_u^-$ ) responses.

Stepwise Decomposition for DPO

The authors adapt the gradient decomposition framework to DPO and use $\mathcal{A}^t$ , $\mathcal{K}^t$ , and $\mathcal{G}^t_\text{DPO}$ to represent model updates. This decomposition:

Highlights the consistent positive vector imposed on $y_u^+$ and the negative vector on $y_u^-$ .
Demonstrates how the negative gradient leads to a “squeezing effect,” where the decrease in the probability of $y_u^-$ reallocates probability mass primarily to the most confident token.

Experimental Validation

The paper validates the framework through experiments on LLMs like pythia and Qwen models across different configurations. It consistently observes that:

$\pi_{\theta^t}(y_u^-)$ decays faster than $\pi_{\theta^t}(y_u^+)$ .
The negative gradient in off-policy DPO indeed squeezes probabilities towards already likely tokens.
Conducting SFT on both $[x; y_u^+]$ and $[x; y_u^-]$ before DPO results in more stable training and improved alignment.

Implications and Future Directions

Practical Implications

Understanding the learning dynamics provides practical insights into aligning LLM responses with human preferences more effectively. The proposed analytical framework can guide the development of improved finetuning algorithms, mitigating issues like hallucinations and confidence decay.

Theoretical Implications

The paper enriches theoretical understanding of gradient-based learning in high-dimensional spaces, specifically how gradient ascent affects probabilistic distributions. The findings about the squeezing effect emphasize the importance of ensuring balanced updates in preference optimization tasks.

Conclusion

The paper makes significant strides in elucidating the learning dynamics of LLM finetuning. By systematically exploring the interplay between gradients and model predictions, the authors offer both explanatory and predictive insights into the behavior of LLMs under various finetuning strategies. This foundation paves the way for more refined and effective alignment techniques, shaping future advancements in AI model training.