- The paper demonstrates that steering vectors extracted from activation differences enable effective chain-of-thought reasoning in large language models.
- The methodology integrates contrasting prompts and grid search for optimal vector injection in models like Llama3 8b Instruct and Mistral 7b.
- Experimental results show performance gains on benchmarks such as GSM8k, MMLU, and AGI Eval, suggesting a scalable alternative to traditional fine-tuning.
Uncovering Latent Chain of Thought Vectors in LLMs
Overview
The paper "Uncovering Latent Chain of Thought Vectors in LLMs" by Jason Zhang and Scott Viteri addresses the problem of guiding LLMs (LMs) to perform Chain of Thought (CoT) reasoning through the use of steering vectors. CoT reasoning allows LMs to decompose complex problems into more manageable sub-tasks, potentially enhancing their ability to produce accurate and understandable outputs without the need for explicit natural language prompts. This paper is particularly relevant, given the increasing integration of LMs into various societal applications where reliability in their output is crucial.
Methodology
The authors focus on the extraction and utilization of steering vectors in LMs to facilitate CoT reasoning. They employ a method derived from existing work on activation engineering, specifically building on the strategies laid out by Panickssery (2024). The methodology involves:
- Data Corpus: Collecting a diverse set of reasoning questions from benchmarks like GSM8k, MMLU, and Big Bench Lite. This diverse corpus helps mitigate noise during the steering vector extraction process.
- Activation Extraction: Applying contrasting prompts to these questions to extract activation layers from pre-specified LM layers. These activations are then average-pooled across token positions and questions.
- Steering Vector Formation: The final steering vector is derived by subtracting the activation vector prompted by immediate answers from the one prompted by step-by-step thinking.
- Steering Application: During inference, steering vectors are injected into the LM using a PyTorch hook, adjusting both the specified layer and injection coefficient through grid search to optimize performance.
Experimental Setup and Evaluation
The experiments focus on two LMs, Llama3 8b Instruct and Mistral 7b v0.2 Instruct. The authors then evaluate the performance enhancements in these models using several reasoning benchmarks: GSM8k, MMLU, ARC AI2-C, and AGI Eval (SAT Math section). Key performance metrics are compared between steered models and those using traditional CoT-prompting methods.
Results
As summarized in Table 1 from the paper, the findings demonstrate that the models steered by activation vectors perform comparably to those prompted through traditional CoT techniques.
| Model | GSM8K | MMLU | ARC AI2-C | AGI Eval (SAT Math) |
|-|-|-|--||
| Llama3 8b Instruct: CoT Prompted | 73.90 | 65.60 | 80.46 | 59.09 |
| Llama3 8b Instruct: w/ Steering | 79.15 | 64.20 | 81.23 | 61.40 |
| Mistral 7b v0.2 Instruct: CoT | 50.72 | 48.95 | 60.75 | 40.00 |
| Mistral 7b v0.2 Instruct: w/ | 48.20 | 52.30 | 62.70 | 42.30 |
These results underscore the potential effectiveness of steering vectors in guiding LMs toward CoT reasoning with reduced computational demands compared to traditional fine-tuning or human feedback methods.
Implications and Future Work
The implications of this research are twofold. Practically, steering vectors present an efficient method to achieve desired LM behavior without extensive retraining. Theoretically, the work supports the hypothesis that LMs' internal activations hold significant symbolic meanings that can be harnessed for behavior modification.
Future research could explore:
- Scalability: Assessing the approach's effectiveness on larger models and a broader range of tasks.
- Optimization: Fine-tuning the data corpus and injection strategies to further refine steering vector effectiveness.
- Robustness: Ensuring that the approach generalizes well across diverse and previously unseen reasoning tasks.
In summary, this paper provides a rigorous examination of using activation space manipulation to steer LMs towards CoT reasoning, showcasing promising results and paving the way for more efficient and effective use of LLMs in complex problem-solving scenarios.