Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 89 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 29 tok/s Pro
GPT-5 High 31 tok/s Pro
GPT-4o 98 tok/s Pro
GPT OSS 120B 424 tok/s Pro
Kimi K2 164 tok/s Pro
2000 character limit reached

Context-Parametric Inversion: Why Instruction Finetuning Can Worsen Context Reliance (2410.10796v3)

Published 14 Oct 2024 in cs.LG and cs.CL

Abstract: A standard practice when using LLMs is for users to supplement their instruction with an input context containing new information for the model to process. However, models struggle to reliably follow the input context, especially when it conflicts with their parametric knowledge from pretraining. In-principle, one would expect models to adapt to the user context better after instruction finetuning, particularly when handling knowledge conflicts. However, we observe a surprising failure mode: during instruction tuning, the context reliance under knowledge conflicts initially increases as expected, but then gradually decreases as instruction finetuning progresses. This happens while the performance on standard benchmarks keeps on increasing far after this drop. We call this phenomenon context-parametric inversion and observe it across multiple general purpose instruction tuning datasets such as TULU, Alpaca and Ultrachat, across different model families like Llama, Mistral, and Pythia. We perform various controlled studies and theoretical analysis to show that context-parametric inversion occurs due to examples in the instruction finetuning data where the input context provides information that aligns with model's parametric knowledge. Our analysis suggests some natural mitigation strategies with limited but insightful gains, and serves as a useful starting point in addressing this deficiency in instruction finetuning.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper demonstrates the 'context-parametric inversion' where increased reliance on context during early finetuning is later diminished.
  • It systematically evaluates multiple datasets and model families, showing how non-context-critical datapoints drive the decline in context reliance.
  • It proposes mitigation strategies, including data curation and counterfactual augmentation, to maintain robust context integration.

Context-Parametric Inversion in Instruction Finetuning

The paper "Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance" explores a counterintuitive phenomenon observed during the instruction finetuning (IFT) of LLMs. Instruction finetuning is commonly employed to enhance models' ability to process user contexts alongside existing parametric knowledge. However, the authors demonstrate that the expected improvement in context reliance due to IFT is not consistently realized, particularly in scenarios involving knowledge conflicts.

Key Contributions and Observations

  1. Context-Parametric Inversion Phenomenon: The paper introduces the concept of "context-parametric inversion," where models initially exhibit increased reliance on user-provided context during finetuning, but this reliance declines as finetuning progresses further. This decline occurs despite ongoing improvements in performance on standard benchmarks.
  2. Evaluation Across Models and Datasets: The phenomenon is observed across multiple instruction finetuning datasets such as TULU, Alpaca, and UltraChat, and model families including Llama, Mistral, and Pythia. The authors systematically track context reliance using knowledge conflict datasets that contain contexts counterfactual to known parametric knowledge.
  3. Detailed Examination and Theoretical Insights: Through empirical analysis, the authors categorize finetuning data into "context-critical" and "non-context-critical" points. They demonstrate that non-context-critical datapoints, where the context aligns with the model's pretraining knowledge, drive the observed decrease in context reliance in later stages of finetuning.
  4. Mitigation Strategies: The paper explores potential mitigation strategies including data curation to filter out non-context-critical points, counterfactual data augmentation, and limiting updates to query and key matrices during finetuning. While some gains are reported, challenges and trade-offs are discussed.

Theoretical Framework

The authors present a theoretical analysis using a simplified one-layer transformer model. They show that:

  • During early finetuning, "context-critical" points dominate gradients, leading to increased attention to context.
  • As finetuning progresses, "non-context-critical" points start to dominate, shifting model reliance back to parametric knowledge.
  • This dynamic is attributed to optimization behaviors that prefer minimizing loss on points where pretraining knowledge aids in answering, beyond contextual information.

Implications and Future Work

The findings challenge assumptions about the efficacy of IFT in enhancing context reliance, raising important questions about the design of instruction finetuning datasets and approaches. Practically, this could impact the deployment of LLMs in retrieval-augmented generation (RAG) systems where context processing is critical.

Theoretical implications extend to understanding model optimization dynamics and the broader interplay between training data composition and model behavior. The observed inversion offers insights into potential deficiencies in instruction tuning, motivating refined methodologies that address both improvement on benchmarks and robustness to knowledge conflicts.

Future work may focus on developing more sophisticated dataset curation and augmentation techniques, as well as exploring alternative finetuning strategies that better incorporate user context without sacrificing factual consistency. Additionally, investigating the implications of context-parametric inversion in diverse AI applications could yield richer understanding and solutions.

In summary, this paper provides a comprehensive analysis of a critical shortcoming in instruction finetuning, posing significant considerations for both AI researchers and practitioners aiming to optimize LLM context usability.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube