- The paper identifies key learning dynamics through gradient decomposition to reveal stepwise influences of training examples on model predictions.
- It demonstrates that both supervised and preference-based finetuning shift probabilities, with experiments showing more stable updates and improved output alignment.
- The findings offer actionable insights for mitigating issues like hallucinations and confidence decay, paving the way for refined finetuning algorithms.
Learning Dynamics of LLM Finetuning
The paper "Learning Dynamics of LLM Finetuning" by Yi Ren and Danica J. Sutherland offers an in-depth exploration of the learning dynamics during the finetuning of LLMs. The authors utilize a gradient-based framework to analyze step-wise decompositions and accumulated influences among responses, providing a uniform interpretation across various finetuning algorithms like supervised finetuning (SFT), self-play finetuning (SPIN), and direct preference optimization (DPO).
Overview of Learning Dynamics
Learning dynamics describe how specific training examples influence a model’s predictions on other examples. The authors focus on how changes in model parameters, Δθ, affect the predictions, Δfθ, via gradient descent. For LLMs, this approach allows researchers to understand phenomena during finetuning, helping to propose novel algorithms based on these insights.
Application to Supervised Finetuning (SFT)
In SFT, the typical loss function is the negative log-likelihood (NLL) of given completions conditioned on the prompt. The analysis starts by representing the model’s learning dynamics through a gradient decomposition, which considers the influence of Kt, the empirical neural tangent kernel, representing the model-specific similarity between different input samples, and Gt, describing the direction and energy of the model’s adaptation.
Semantic Analysis through Probing Dataset
To tackle the vast response space, the authors divide it into three subspaces:
- YIF: Reasonable responses following the instruction.
- Ynon-IF: Irrelevant but human language responses.
- Ynon-hum: Non-human language sequences.
By tracking seven typical responses, the paper demonstrates that responses in YIF generally see an increase in predicted probability during early stages of finetuning, while responses in Ynon-hum consistently decrease.
Learning Dynamics of Preference Finetuning
Preference finetuning aims to align LLM outputs with human preferences. This paper specifically examines RL-free methods like DPO, where the loss function involves a comparison between preferred (yu+) and less preferred (yu−) responses.
Stepwise Decomposition for DPO
The authors adapt the gradient decomposition framework to DPO and use At, Kt, and GDPOt to represent model updates. This decomposition:
- Highlights the consistent positive vector imposed on yu+ and the negative vector on yu−.
- Demonstrates how the negative gradient leads to a “squeezing effect,” where the decrease in the probability of yu− reallocates probability mass primarily to the most confident token.
Experimental Validation
The paper validates the framework through experiments on LLMs like pythia and Qwen models across different configurations. It consistently observes that:
- πθt(yu−) decays faster than πθt(yu+).
- The negative gradient in off-policy DPO indeed squeezes probabilities towards already likely tokens.
- Conducting SFT on both [x;yu+] and [x;yu−] before DPO results in more stable training and improved alignment.
Implications and Future Directions
Practical Implications
Understanding the learning dynamics provides practical insights into aligning LLM responses with human preferences more effectively. The proposed analytical framework can guide the development of improved finetuning algorithms, mitigating issues like hallucinations and confidence decay.
Theoretical Implications
The paper enriches theoretical understanding of gradient-based learning in high-dimensional spaces, specifically how gradient ascent affects probabilistic distributions. The findings about the squeezing effect emphasize the importance of ensuring balanced updates in preference optimization tasks.
Conclusion
The paper makes significant strides in elucidating the learning dynamics of LLM finetuning. By systematically exploring the interplay between gradients and model predictions, the authors offer both explanatory and predictive insights into the behavior of LLMs under various finetuning strategies. This foundation paves the way for more refined and effective alignment techniques, shaping future advancements in AI model training.