ReFT: Representation Finetuning for Language Models (2404.03592v3)

Published 4 Apr 2024 in cs.CL, cs.AI, and cs.LG

Abstract: Parameter-efficient finetuning (PEFT) methods seek to adapt large neural models via updates to a small number of weights. However, much prior interpretability work has shown that representations encode rich semantic information, suggesting that editing representations might be a more powerful alternative. We pursue this hypothesis by developing a family of Representation Finetuning (ReFT) methods. ReFT methods operate on a frozen base model and learn task-specific interventions on hidden representations. We define a strong instance of the ReFT family, Low-rank Linear Subspace ReFT (LoReFT), and we identify an ablation of this method that trades some performance for increased efficiency. Both are drop-in replacements for existing PEFTs and learn interventions that are 15x--65x more parameter-efficient than LoRA. We showcase LoReFT on eight commonsense reasoning tasks, four arithmetic reasoning tasks, instruction-tuning, and GLUE. In all these evaluations, our ReFTs deliver the best balance of efficiency and performance, and almost always outperform state-of-the-art PEFTs. We release a generic ReFT training library publicly at https://github.com/stanfordnlp/pyreft.

Citations (32)

View on Semantic Scholar

Summary

The paper demonstrates that shifting focus from weight updates to representation interventions via ReFT can improve task performance.
LoReFT, employing low-rank approximations, significantly reduces parameters while boosting benchmarks in commonsense and arithmetic reasoning.
The method offers practical benefits for deploying adaptable LLMs in resource-constrained settings and advances our understanding of hidden representations.

Representation Finetuning (ReFT): An Efficient and Powerful Alternative to PEFT

Introduction to ReFT and LoReFT

The pursuit of making pretrained LLMs more adaptable to downstream tasks without the need to finetune all the parameters has led to the development of Parameter-efficient Finetuning (PEFT) methods. PEFT methods, such as Adapters and LoRA, have demonstrated considerable success in adapting LLMs to new tasks by updating only a fraction of the model's weights. However, these methods primarily focus on weight updates rather than representation modifications.

Our paper introduces Representation Finetuning (ReFT), a novel method that shifts the focus from weight adaptation to representation intervention. Unlike traditional PEFT, ReFT operates on the hidden representations within a model, learning specific interventions that guide the model's output towards desired behaviors for the task at hand. A particularly compelling instance of the ReFT family, which we refer to as Low-rank Linear Subspace (LoReFT), exploits low-rank approximations to perform these interventions in an efficient manner. LoReFT demonstrates significant improvements over state-of-the-art PEFT methods, both in terms of parameter efficiency and task performance.

Investigation into LoReFT

LoReFT introduces a method for intervening on model representations with precise control using a minimal number of parameters. By performing interventions in a low-dimensional linear subspace, the method achieves notable efficiency. In our experiments, LoReFT outperforms existing PEFT methods across several benchmarks, including commonsense reasoning, arithmetic reasoning, instruction following, and natural language understanding tasks, with significantly fewer parameters.

The evaluation across different model sizes and tasks reveals that LoReFT retains or improves performance with a dramatic reduction in additional parameter count. This observation underscores the power of representation-level interventions, suggesting that much of the information necessary for adapting to new tasks is already encoded within the hidden representations of pretrained models.

Theoretical and Practical Implications

The success of LoReFT has multiple implications for future research in LLM adaptation. Theoretically, it suggests that the space of potential behavior modifications within LLMs may be more effectively navigated through representation interventions rather than weight modifications. This aligns with findings from interpretability research indicating that meaningful concepts are often encoded in linear subspaces of model representations.

Practically, the efficiency of ReFT opens up new avenues for deploying adapted LLMs in constrained environments, where the overhead of PEFT methods may be prohibitive. ReFT's performance suggests that it could serve as a foundational method for future work seeking to create more adaptable, efficient, and interpret-able LLMs.

Future Directions

While our current investigations into ReFT and LoReFT show promising results, there is much to explore. For instance, understanding the limits of representation interventions, both in terms of the complexity of tasks they can adapt to and the granularity of behavior they can control, remains an open question. Moreover, exploring the integration of ReFT methods with interpretability techniques could yield new insights into how LLMs encode and manipulate knowledge.

Additionally, adapting ReFT for other types of models, beyond the transformer architectures explored here, could broaden its applicability. Investigating how these interventions interact with different forms of pretraining tasks, data distributions, and model sizes will be crucial in understanding the full potential of representation finetuning.

Conclusion

Representation Finetuning emerges as an efficacious strategy for adapting pretrained LLMs to new tasks with minimal parameter updates. LoReFT, as a practical embodiment of the ReFT concept, underscores the latent power within the representation space of these models, challenging the prevalent paradigm of weight-based finetuning methods. Through further exploration and experimentation, ReFT could significantly advance our ability to leverage and understand the inner workings of LLMs, paving the way for more versatile and efficient AI systems.