Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 84 tok/s

Gemini 2.5 Pro 48 tok/s Pro

GPT-5 Medium 21 tok/s Pro

GPT-5 High 28 tok/s Pro

GPT-4o 96 tok/s Pro

GPT OSS 120B 462 tok/s Pro

Kimi K2 189 tok/s Pro

2000 character limit reached

Context-Parametric Inversion: Why Instruction Finetuning Can Worsen Context Reliance (2410.10796v3)

Published 14 Oct 2024 in cs.LG and cs.CL

Abstract: A standard practice when using LLMs is for users to supplement their instruction with an input context containing new information for the model to process. However, models struggle to reliably follow the input context, especially when it conflicts with their parametric knowledge from pretraining. In-principle, one would expect models to adapt to the user context better after instruction finetuning, particularly when handling knowledge conflicts. However, we observe a surprising failure mode: during instruction tuning, the context reliance under knowledge conflicts initially increases as expected, but then gradually decreases as instruction finetuning progresses. This happens while the performance on standard benchmarks keeps on increasing far after this drop. We call this phenomenon context-parametric inversion and observe it across multiple general purpose instruction tuning datasets such as TULU, Alpaca and Ultrachat, across different model families like Llama, Mistral, and Pythia. We perform various controlled studies and theoretical analysis to show that context-parametric inversion occurs due to examples in the instruction finetuning data where the input context provides information that aligns with model's parametric knowledge. Our analysis suggests some natural mitigation strategies with limited but insightful gains, and serves as a useful starting point in addressing this deficiency in instruction finetuning.

Collections

Summary

The paper demonstrates the 'context-parametric inversion' where increased reliance on context during early finetuning is later diminished.
It systematically evaluates multiple datasets and model families, showing how non-context-critical datapoints drive the decline in context reliance.
It proposes mitigation strategies, including data curation and counterfactual augmentation, to maintain robust context integration.

Context-Parametric Inversion in Instruction Finetuning

The paper "Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance" explores a counterintuitive phenomenon observed during the instruction finetuning (IFT) of LLMs. Instruction finetuning is commonly employed to enhance models' ability to process user contexts alongside existing parametric knowledge. However, the authors demonstrate that the expected improvement in context reliance due to IFT is not consistently realized, particularly in scenarios involving knowledge conflicts.

Key Contributions and Observations

Context-Parametric Inversion Phenomenon: The paper introduces the concept of "context-parametric inversion," where models initially exhibit increased reliance on user-provided context during finetuning, but this reliance declines as finetuning progresses further. This decline occurs despite ongoing improvements in performance on standard benchmarks.
Evaluation Across Models and Datasets: The phenomenon is observed across multiple instruction finetuning datasets such as TULU, Alpaca, and UltraChat, and model families including Llama, Mistral, and Pythia. The authors systematically track context reliance using knowledge conflict datasets that contain contexts counterfactual to known parametric knowledge.
Detailed Examination and Theoretical Insights: Through empirical analysis, the authors categorize finetuning data into "context-critical" and "non-context-critical" points. They demonstrate that non-context-critical datapoints, where the context aligns with the model's pretraining knowledge, drive the observed decrease in context reliance in later stages of finetuning.
Mitigation Strategies: The paper explores potential mitigation strategies including data curation to filter out non-context-critical points, counterfactual data augmentation, and limiting updates to query and key matrices during finetuning. While some gains are reported, challenges and trade-offs are discussed.

Theoretical Framework

The authors present a theoretical analysis using a simplified one-layer transformer model. They show that:

During early finetuning, "context-critical" points dominate gradients, leading to increased attention to context.
As finetuning progresses, "non-context-critical" points start to dominate, shifting model reliance back to parametric knowledge.
This dynamic is attributed to optimization behaviors that prefer minimizing loss on points where pretraining knowledge aids in answering, beyond contextual information.

Implications and Future Work

The findings challenge assumptions about the efficacy of IFT in enhancing context reliance, raising important questions about the design of instruction finetuning datasets and approaches. Practically, this could impact the deployment of LLMs in retrieval-augmented generation (RAG) systems where context processing is critical.

Theoretical implications extend to understanding model optimization dynamics and the broader interplay between training data composition and model behavior. The observed inversion offers insights into potential deficiencies in instruction tuning, motivating refined methodologies that address both improvement on benchmarks and robustness to knowledge conflicts.

Future work may focus on developing more sophisticated dataset curation and augmentation techniques, as well as exploring alternative finetuning strategies that better incorporate user context without sacrificing factual consistency. Additionally, investigating the implications of context-parametric inversion in diverse AI applications could yield richer understanding and solutions.

In summary, this paper provides a comprehensive analysis of a critical shortcoming in instruction finetuning, posing significant considerations for both AI researchers and practitioners aiming to optimize LLM context usability.

PDF Markdown

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (4)

Tweets

https://twitter.com/goyalsachin007/status/1849464115799990651