Instruction Following without Instruction Tuning (2409.14254v1)

Published 21 Sep 2024 in cs.CL

Abstract: Instruction tuning commonly means finetuning a LLM on instruction-response pairs. We discover two forms of adaptation (tuning) that are deficient compared to instruction tuning, yet still yield instruction following; we call this implicit instruction tuning. We first find that instruction-response pairs are not necessary: training solely on responses, without any corresponding instructions, yields instruction following. This suggests pretrained models have an instruction-response mapping which is revealed by teaching the model the desired distribution of responses. However, we then find it's not necessary to teach the desired distribution of responses: instruction-response training on narrow-domain data like poetry still leads to broad instruction-following behavior like recipe generation. In particular, when instructions are very different from those in the narrow finetuning domain, models' responses do not adhere to the style of the finetuning domain. To begin to explain implicit instruction tuning, we hypothesize that very simple changes to a LLM's distribution yield instruction following. We support this by hand-writing a rule-based LLM which yields instruction following in a product-of-experts with a pretrained model. The rules are to slowly increase the probability of ending the sequence, penalize repetition, and uniformly change 15 words' probabilities. In summary, adaptations made without being designed to yield instruction following can do so implicitly.

PDF Abstract

Implicit Instruction Tuning in LLMs

Introduction

The paper investigates the phenomenon of instruction-following behavior in LLMs, delineating approaches that implicitly induce this behavior without explicitly training on instruction-response pairs. Classic instruction tuning involves finetuning models on datasets where inputs are paired with corresponding tasks or instructions. However, the paper introduces two alternative adaptation methods that also result in instruction-following: response tuning and single-task finetuning. These methods are explored and evaluated against conventional instruction tuning methods to ascertain their efficacy in eliciting instruction-following behavior.

Key Findings

The authors present an empirical examination that demonstrates both response tuning and single-task finetuning can yield general instruction-following behavior. These methods are assessed using multiple metrics, particularly focusing on a custom evaluation set and AlpacaEval, which judges model performance in head-to-head comparisons against instruction-tuned models.

Response Tuning

Method: Response tuning involves training models solely on desired responses without associating them with explicit instructions. This approach optimizes the probability of responses independently of instructions.

Results:

The paper reveals that response-tuned models win approximately 43% of head-to-head evaluations against instruction-tuned counterparts.
Importantly, the pretrained models exhibit an inherent capacity for mapping instructions to appropriate responses, suggesting this mapping is somewhat learned during pretraining.
Table 1 displays the comparison of win rates, revealing that response-tuned models significantly outperform base models, which win only 2.4% of the time.

These results imply that the crucial aspect of instruction tuning is informing the model of the distribution of desirable responses, which seems to be inherently understood to some extent by pretrained models.

Single-Task Finetuning

Method: This methodology involves finetuning models on data from a specific narrow domain, such as poetry or code generation, which ostensibly teaches the model to generate outputs in that domain exclusively.

Results:

Surprisingly, single-task finetuned models exhibit broad instruction-following behavior, suggesting that instruction-following does not necessitate task diversity during finetuning.
Finetuned models on datasets like GSM (math problems) or poetic texts reveal a substantial instruction-following performance, winning between 14% and 30% of the time in evaluations against instruction-tuned models.
Figure 2 exemplifies outputs from single-task finetuned models, illustrating that despite training on narrow-domain data, these models generate coherent responses to diverse instructions.

These results indicate that LLMs retain a general instruction-following capability, even when finetuned on task-specific data. This observation raises important questions about how models generalize and whether fine-tuning leads to unexpected behavior arising from implicit capabilities learned during pretraining.

Explanations via Rule-Based Modeling

To further understand why such diverse tuning methods induce instruction-following, the authors propose a rule-based LLM as a 'product of experts'. This model applies simple, hand-written rules to a pretrained model’s output probabilities:

Gradually increase the probability of ending the sequence.
Penalize repetition of tokens.
Uniformly adjust the probabilities of a select few tokens.

Results:

The rule-based model achieves a 24.4% win rate against instruction-tuned models, underscoring that simple distributional modifications can significantly impact instruction-following behavior (Table 2).
Ablations confirm that each rule contributes to the overall efficacy, supporting the hypothesis that small, targeted changes can elicit instruction-following.

Implications and Future Directions

Practical Implications:

The findings suggest that LLMs adapted for a specific task may inadvertently become general instruction followers. Hence, practitioners should rigorously test models under diverse prompts to ensure task-specific behavior is preserved where necessary.
Safety and reliability protocols must consider that models might not be constrained strictly by the finetuning objectives, potentially leading to unexpected behaviors.

Theoretical Implications:

These observations challenge our understanding of how LLM capabilities are developed and encoded during pretraining and finetuning. The persistence of implicit capabilities suggests deeply ingrained understanding and flexibility in these models that defy straightforward task-specific adaptations.
Future work should investigate the pretraining data and methodologies that contribute to these inherent capabilities, aiming to better control and predict model behavior post-adaptation.

Future Directions:

Developing more nuanced metrics for instruction-following quality can refine our understanding of implicit tuning impacts.
Exploring other adaptation techniques and their implications on instruction-following can broaden our toolkit for model optimization without unintended behavioral generalizations.
Further analysis of the pretraining phase and its effects on downstream applications can provide insights into crafting more robust pretraining and finetuning pipelines.

In conclusion, the paper provides intriguing evidence that LLMs exhibit broad instruction-following behavior even when finetuned in manners not explicitly designed to yield such behavior. This highlights the sophisticated understanding encoded within these models during pretraining and raises important considerations for their deployment and further development.