Parameter-Space Reasoning Vectors

Updated 16 April 2026

Parameter-space reasoning vectors are explicit shifts in neural parameters that encode multi-step reasoning, enabling efficient knowledge transfer across tasks.
Techniques such as parameter delta vectors, latent representation shifts, and low-rank injections facilitate structured reasoning in language, vision, and action domains.
Empirical results show that these vectors yield significant accuracy improvements with high parameter efficiency, exemplified by methods like TinyLoRA.

Parameter-space reasoning vectors are compact, structured modifications of neural network parameters or internal representations designed to induce, transfer, or probe reasoning capabilities in neural architectures. Recent research formulates these reasoning vectors across several paradigms, including direct parameter deltas between differently trained models, additive latent-space shifts for stepwise reasoning, and low-rank structural updates informed by task-specific knowledge. These methods provide a mechanism for encoding multi-step reasoning within high-dimensional parameter or activation spaces, supporting efficient knowledge transfer and mechanistic interpretability across language, vision, and action models.

1. Formalism and Taxonomy of Parameter-Space Reasoning Vectors

Parameter-space reasoning vectors are typically defined as explicit offsets in model weights, subspaces of latent representations, or updates injected in the parameter manifold to encode or transfer reasoning ability. Several instantiations are prominent:

Parameter Delta Vectors: Reasoning vectors are computed as $v = \theta_{\text{reason}} - \theta_{\text{base}}$ , where $\theta_{\text{reason}}$ results from training on a reasoning task and $\theta_{\text{base}}$ is a reference model, often trained with supervised fine-tuning but lacking advanced reasoning skills. This construction, termed "reasoning vectors" or "task vectors," enables transfer of reasoning via addition, masking, or scaling in the parameter space (Zbeeb et al., 1 Sep 2025, Horoi et al., 13 Nov 2025).
Latent Representation Shifts (Reasoning Vectors in Hidden Space): Additive vectors in hidden activation space (e.g., CoT vectors) are computed from differences in representations with and without explicit reasoning traces. These vectors act as generic reasoning directions that can be injected at specific layers to steer generation toward multi-step reasoning behaviors (Li et al., 1 Oct 2025).
Low-rank Parameter-space Injections: Parameter updates of the form $\Delta W_t = f(G_t; \phi)$ are constructed via mapping from intermediate semantic structures (e.g., dynamic scene graphs) to modulate VLA model parameters in support of structured task decomposition (Hou et al., 7 Feb 2026).
Ultra-Low-Dimensional Reasoning Parametrizations: TinyLoRA parameterization reduces the reasoning update to as few as one or several globally tied parameters, demonstrating that the essential directions for reasoning adaptation can be encoded in extremely compact subspaces (Morris et al., 4 Feb 2026).
Continuous Latent Thought Vectors: Latent vectors $z\in\mathbb{R}^d$ parameterize the sequence-level reasoning plan in disentangled reasoning-decoding systems, supporting gradient-based optimization of the "reasoning plan" at inference (Kong et al., 6 Feb 2026).

These approaches collectively profile reasoning not as an emergent property of large, opaque models, but as structured, manipulable transformations in parameter or latent spaces.

2. Extraction, Alignment, and Injection Procedures

The formation and application of parameter-space reasoning vectors follow principled procedures:

Extraction by Model Differencing: Given two compatible models, typically sharing initialization, architecture, and tokenizer, the difference $\theta_{\text{reason}} - \theta_{\text{base}}$ serves as a "reasoning skill" vector. Injection is via $\theta' = \theta_{\text{target}} + \alpha v$ , with $\alpha$ a scaling parameter. Masking can localize transfer to submodules (Zbeeb et al., 1 Sep 2025). Alignment is essential if models have diverged due to permutation, rotation, or scaling symmetries; group-theoretic alignment algorithms (Hungarian matching, Procrustes rotation, and scaling) are applied to bring parameters into correspondence before vector arithmetic (Horoi et al., 13 Nov 2025).
Latent-space Extraction and Injection: For activation-space vectors, representations of models under reasoning (e.g., CoT) and non-reasoning traces are averaged, and their difference defines the task-general vector $\vec v_{\mathrm{CoT}}$ to be injected at a chosen layer. Learnable variants are optimized by freezing the underlying model and training $\vec v_L$ to mimic the distributional effects of explicit reasoning teacher models via student-teacher KL and cross-entropy losses (Li et al., 1 Oct 2025).
Low-rank Parameter Injection with Structural Projections: Dynamic graphs or semantic plans extracted from the model’s intermediate reasoning state are projected into low-rank parameter updates $\theta_{\text{reason}}$ 0, using architectures such as (MLPs, LoRA-bases) and inserted into relevant parts of the network, thereby modulating decision policies in accord with scene structure or subtask decomposition (Hou et al., 7 Feb 2026).
Extreme Parameter Compression: TinyLoRA constructs adapters that tie all updated modules across the model to a single shared vector, further projected by random or structured matrices, reducing reasoning adaptation to as few as 13 parameters for large LLMs with minimal loss of performance (Morris et al., 4 Feb 2026).

3. Empirical Performance and Analysis

Empirical studies consistently show that reasoning vectors—either in weight space or representation space—enable substantial and highly parameter-efficient gains in multi-step reasoning across diverse tasks:

Method	#Params	GSM8K Acc.	MATH Acc.	Notes
Baseline (no reasoning vec)	0	74.6	69.9	Typical LLM zero-/few-shot
Extracted CoT Vector	0	78.2	72.0	Additive shift in hidden space
Learnable CoT Vector	~3.6K	83.5	71.9	Teacher-student KL, 2000× smaller than LoRA
LoRA (rank-16)	10M	79.0	70.4	Conventional parameter-efficient
TinyLoRA (13 params)	13	91.8	74.6	Tied adapters, RL fine-tuning
Parameter-wise Reasoning+Align	full-model	63.8–64.4	(avg. multi-math)	Close to oracle performance

For chain-of-thought style reasoning, reasoning vectors enable +2–5% accuracy gains, with additive effects when combined with explicit reasoning prompts, and exhibit robustness to adversarial input perturbations (Zbeeb et al., 1 Sep 2025, Li et al., 1 Oct 2025). Notably, subtracting the reasoning vector sharply degrades reasoning performance, highlighting the causal role of the vector direction itself.

In vision-language-action settings, embedding dynamic scene structure as parameter reasoning vectors (ΔWₜ) yields success rates notably exceeding both monolithic fine-tuning and prompt-based scene decomposition, especially for tasks requiring semantic generalization across object configurations and temporal decompositions (Hou et al., 7 Feb 2026).

Ultra-compressed adapters with as few as 13 parameters (TinyLoRA) approach or match full-model fine-tuned performance when updated with reinforcement learning signals, supporting the existence of globally shared, dominant reasoning directions in large model parameter space (Morris et al., 4 Feb 2026).

4. Interpretability and Probing Mechanisms

Additive and low-rank reasoning vectors serve not only as mechanisms for capability transfer but as probes for mechanistic understanding:

Stage-specific Latent Geometry: Injected CoT vectors reveal that only shallow and deep layers of transformer models admit alignment to fixed "reasoning directions," while middle layers are too high-dimensional and sample-specific for fixed reasoning vectors to be reliable probes. The U-shaped accuracy curve under layerwise injection is consistent across architectures (Li et al., 1 Oct 2025).
Principal Component Analysis: Reasoning information density is reflected in the number of PCA components needed to explain hidden state variance; middle ("core reasoning") layers require far more principal directions, elucidating why fixed vectors are less effective in these strata.
Rotation and Vector Norm Analysis: Learnable reasoning vectors exert coherent geometric shifts in latent space, whereas extracted vectors mainly add sample-specific noise; the effect is architecture-dependent, with models such as Qwen showing more pronounced principal reasoning subspaces than LLaMA (Li et al., 1 Oct 2025).
Reasoning Vectors as Probes: Parameter deltas and latent reasoning directions systematically degrade or unlock reasoning when subtracted or added, quantifying their specificity and sufficiency for the reasoning function (Zbeeb et al., 1 Sep 2025, Li et al., 1 Oct 2025).

5. Practical and Theoretical Implications

The parameter-space reasoning vector paradigm leads to several practical and theoretical consequences:

Parameter Efficiency: Substantial improvements in reasoning ability can be achieved with one to a few thousand additional trainable parameters, orders of magnitude lower than full fine-tuning or even conventional LoRA-style adapters (Li et al., 1 Oct 2025, Morris et al., 4 Feb 2026).
Model Compatibility and Symmetry Considerations: Direct transfer is enabled under compatible architectures, shared initialization, and tokenization. In cases where models diverge (e.g., independently trained instruction models), alignment techniques leveraging permutation, rotation, and scaling symmetries recover task arithmetic efficacy (Horoi et al., 13 Nov 2025).
Generalization and Structured Policy Induction: Embedding a reasoning vector informed by structural task decomposition, as in iSTAR’s graph-based parameter updates, leads to provably tighter generalization bounds by reducing policy complexity (from action to concept space), and improves compositional adaptation across diverse task variants (Hou et al., 7 Feb 2026).
Limitations: Transfer breaks down under incompatible model architectures or tokenizers; excessive compression of vector updates may undermine fine-grained adaptations; monolithic parameter-based reasoning injection can cause negative transfer on unrelated tasks (e.g., instruction following) (Horoi et al., 13 Nov 2025, Morris et al., 4 Feb 2026).

6. Theoretical Foundations, Extensions, and Outlook

The theoretical basis for parameter-space reasoning vectors builds upon:

Linear Mode Connectivity: The assumption that there exists a low-loss path in parameter space between baseline and reasoning-optimized models supports the rationale for vector arithmetic-based transfer (Zbeeb et al., 1 Sep 2025).
Algebraic Reasoning in Semantic Embeddings: Vector algebra in semantic space, as in knowledge graph deduction and analogical inference, demonstrates the unification of deductive, analogical, and associative inference in a continuous parameter manifold (Summers-Stay, 2017).
Structured Decomposition and Functional Differentiation: The injection of structural reasoning into parameter space enables differentiable, end-to-end optimization and supports tighter generalization guarantees, as formally analyzed in task-structured control settings (Hou et al., 7 Feb 2026).

Potential extensions include domain-generalization beyond math and code (e.g., to multimodal, instruction-following, or autonomous system reasoning), characterization of optimal basis sets for reasoning vectors, integration with symbolic reasoning systems, and the use of parameter-space manipulation as a probe for model interpretability and internal algorithmic structure.

Parameter-space reasoning vectors thus offer a modular, theoretically grounded, and empirically robust approach for understanding, inducing, and transferring complex reasoning abilities in modern AI systems.