Uncovering Latent Chain of Thought Vectors in Language Models (2409.14026v3)

Published 21 Sep 2024 in cs.CL and cs.AI

Abstract: In this work, we examine how targeted perturbations in the activation space of LLMs (LMs) can encode complex reasoning patterns. We inject steering vectors, derived from LM activations, into LMs during inference time and study whether these vectors can induce Chain-of-Thought (CoT) reasoning in LMs without the need for natural language prompting. We demonstrate this approach on Llama3 8B Instruct and Mistral 7B v0.2 Instruct and show that activation-space interventions achieve competitive, if not superior, performance compared to traditional CoT prompting across multiple reasoning benchmarks, including GSM8k, MMLU, AGI Eval, and ARC AI2. These findings suggest that neural network activations can encode reasoning patterns, offering a new application of activation space manipulation as a tool for tuning model behavior.

Summary

The paper demonstrates that steering vectors extracted from activation differences enable effective chain-of-thought reasoning in large language models.
The methodology integrates contrasting prompts and grid search for optimal vector injection in models like Llama3 8b Instruct and Mistral 7b.
Experimental results show performance gains on benchmarks such as GSM8k, MMLU, and AGI Eval, suggesting a scalable alternative to traditional fine-tuning.

Uncovering Latent Chain of Thought Vectors in LLMs

Overview

The paper "Uncovering Latent Chain of Thought Vectors in LLMs" by Jason Zhang and Scott Viteri addresses the problem of guiding LLMs (LMs) to perform Chain of Thought (CoT) reasoning through the use of steering vectors. CoT reasoning allows LMs to decompose complex problems into more manageable sub-tasks, potentially enhancing their ability to produce accurate and understandable outputs without the need for explicit natural language prompts. This paper is particularly relevant, given the increasing integration of LMs into various societal applications where reliability in their output is crucial.

Methodology

The authors focus on the extraction and utilization of steering vectors in LMs to facilitate CoT reasoning. They employ a method derived from existing work on activation engineering, specifically building on the strategies laid out by Panickssery (2024). The methodology involves:

Data Corpus: Collecting a diverse set of reasoning questions from benchmarks like GSM8k, MMLU, and Big Bench Lite. This diverse corpus helps mitigate noise during the steering vector extraction process.
Activation Extraction: Applying contrasting prompts to these questions to extract activation layers from pre-specified LM layers. These activations are then average-pooled across token positions and questions.
Steering Vector Formation: The final steering vector is derived by subtracting the activation vector prompted by immediate answers from the one prompted by step-by-step thinking.
Steering Application: During inference, steering vectors are injected into the LM using a PyTorch hook, adjusting both the specified layer and injection coefficient through grid search to optimize performance.

Experimental Setup and Evaluation

The experiments focus on two LMs, Llama3 8b Instruct and Mistral 7b v0.2 Instruct. The authors then evaluate the performance enhancements in these models using several reasoning benchmarks: GSM8k, MMLU, ARC AI2-C, and AGI Eval (SAT Math section). Key performance metrics are compared between steered models and those using traditional CoT-prompting methods.

Results

As summarized in Table 1 from the paper, the findings demonstrate that the models steered by activation vectors perform comparably to those prompted through traditional CoT techniques.

| Model | GSM8K | MMLU | ARC AI2-C | AGI Eval (SAT Math) | |-|-|-|--|| | Llama3 8b Instruct: CoT Prompted | 73.90 | 65.60 | 80.46 | 59.09 | | Llama3 8b Instruct: w/ Steering | 79.15 | 64.20 | 81.23 | 61.40 | | Mistral 7b v0.2 Instruct: CoT | 50.72 | 48.95 | 60.75 | 40.00 | | Mistral 7b v0.2 Instruct: w/ | 48.20 | 52.30 | 62.70 | 42.30 |

These results underscore the potential effectiveness of steering vectors in guiding LMs toward CoT reasoning with reduced computational demands compared to traditional fine-tuning or human feedback methods.

Implications and Future Work

The implications of this research are twofold. Practically, steering vectors present an efficient method to achieve desired LM behavior without extensive retraining. Theoretically, the work supports the hypothesis that LMs' internal activations hold significant symbolic meanings that can be harnessed for behavior modification.

Future research could explore:

Scalability: Assessing the approach's effectiveness on larger models and a broader range of tasks.
Optimization: Fine-tuning the data corpus and injection strategies to further refine steering vector effectiveness.
Robustness: Ensuring that the approach generalizes well across diverse and previously unseen reasoning tasks.

In summary, this paper provides a rigorous examination of using activation space manipulation to steer LMs towards CoT reasoning, showcasing promising results and paving the way for more efficient and effective use of LLMs in complex problem-solving scenarios.

PDF Markdown

Related Papers

Tweets

https://twitter.com/JagersbergKnut/status/1840076259264446670

https://twitter.com/grhluna24/status/1883668548754743603

YouTube

Show All Videos