Function Vectors (FVs)
- Function Vectors (FVs) are task-specific, compact representations that aggregate internal model activations into a global vector for effective summarization of complex data.
- They enable precise control of model behavior by steering neural activations, enhancing interpretability and performance across language, vision, and cosmological applications.
- Extraction methods vary from causal mediation analysis in transformers to GMM-based Fisher Vectors in computer vision and Fréchet minimization in cosmology, each tailored to domain-specific challenges.
A function vector (FV) is a task-specific, compact representation extracted from the internal activations of a machine learning model—classically a transformer, but also applicable in probabilistic geometry and computer vision. FVs serve distinct roles depending on context: as global image descriptors for visual retrieval (Fisher Vectors), as causally precise function encoders in large language and multimodal models, or, in cosmology, as Fréchet vectors minimizing variance among multipole vectors to summarize angular data on the sphere. Despite their different technical instantiations, all FVs enable compressed, controllable representations of complex, distributed information.
1. Mathematical Definition and Extraction Procedures
The mathematical formulation and extraction of FVs differ across fields but share the principle of aggregating local or intermedial features into one global or task-level vector.
a) Language and Multimodal Models
For a given task , FVs are derived by first identifying the key attention heads or components responsible for encoding the task function, typically using causal mediation analysis. Let be the activation of head in layer for prompt ; the task-conditional mean is
Heads with the highest causal influence, as measured by the Average Indirect Effect (AIE), are selected. The FV is formed as
where indexes the top- heads. At inference, is additively injected after a chosen layer to steer model function (Todd et al., 2023, Kang et al., 13 Jan 2026, Nadaf, 3 Apr 2026, Fu et al., 2 Oct 2025).
b) Computer Vision: Fisher Vectors
Given a set of local descriptors 0 for an image, a Gaussian Mixture Model (GMM) is fitted; the FV is formed from the concatenated gradients of the log-likelihood with respect to the GMM parameters: 1 where 2 and 3 are the mean- and variance-gradient components for the 4th Gaussian (Chandrasekhar et al., 2015).
c) Geometric/Probabilistic Setting: Fréchet Vectors
Given a set 5 of unit vectors on 6, the Fréchet vector 7 minimizes the mean squared geodesic (great-circle) distance: 8 where 9. The FV is defined as
0
This yields the "center of mass" on the manifold (Rodrigues et al., 2024).
2. Causal Role and Interpretability in Model Architectures
Transformers and State-Space Models
In transformers, FVs are localized to a small, causally indispensable subcircuit of attention heads in intermediate/later layers; injecting these vectors recovers targeted task performance, while ablation disrupts it. In state-space models (e.g., Mamba), FVs are also present but may exhibit different or more distributed mechanisms, as in Mamba2 (Wang et al., 27 Oct 2025).
Vision-LLMs
In LMMs, FVs are extracted from cross-attention head activations and can prime the model for relational reasoning tasks, operating analogously to language FVs but in a multimodal embedding (Fu et al., 2 Oct 2025).
Cosmological Data
In spherical CMB analysis, FVs as Fréchet vectors provide a dimensionally reduced, rotation-invariant probe of multipole structure, serving as sensitive diagnostics for isotropy and higher-order correlations (Rodrigues et al., 2024).
3. Empirical Properties and Application Domains
| Domain | Purpose of FV | Extraction Method |
|---|---|---|
| LLMs/LMMs | Steer in-context learning | Causal mediation, AIE, LRP |
| Vision | Global image descriptor | Log-likelihood gradients (GMM) |
| CMB/Geometry | Center-of-mass direction | Min. Fréchet variance |
LLMs:
- FVs provide high-precision, low-dimensional steering (zero-shot and few-shot ICL), with robust performance on functional/relational tasks, and support analogical reasoning via vector arithmetic (Todd et al., 2023, Kang et al., 13 Jan 2026, Fu et al., 2 Oct 2025).
- Fine-tuning of FVs with small datasets further sharpens this effect (Kang et al., 13 Jan 2026), while composite FVs enable flexible analogy mapping.
Computer Vision:
- Fisher Vectors remain highly competitive as global descriptors for image instance retrieval, particularly for tasks requiring geometric invariance (rotation/scale). Performance is affected by choice of sampling strategy (sparse/dense, single/multi-scale) and normalization (Chandrasekhar et al., 2015).
Cosmology:
- FVs (Fréchet vectors) are effective at blind anomaly localization (e.g., CMB Cold Spot) and serve as more sensitive probes than multipole vectors for departures from Gaussianity and isotropy, detecting 1 deviations in Planck data unless noise/foreground modeling is extremely precise (Rodrigues et al., 2024).
4. Invariance, Transferability, and Limitations
- In LLMs, FVs are not invariant across formats: for the same underlying task, FVs extracted from open-ended versus multiple-choice prompts are nearly orthogonal, indicating encoding of both task and format. This limits their generalization out-of-distribution. By contrast, concept vectors (CVs), selected via representational similarity analysis for format invariance, generalize better but are less causally potent in-distribution (Opiełka et al., 25 Feb 2026, Opiełka et al., 5 Mar 2025).
- FVs in machine translation tasks show partial language-agnosticity: FVs extracted in English→X transfer to unseen target languages and instruction-tuned variants, with strong causal effects on token rankings—yet perform less effectively at the sentence level than on words (Laiyk et al., 21 Apr 2026).
- Steering by FVs can operate even when no "answer direction" is decodable in the unembedding; this dissociation highlights that FVs are computational instructions, not simple output pointers (Nadaf, 3 Apr 2026).
- In computer vision, variants of FVs differ in their invariance properties and computational trade-offs; careful normalization and pooling augment geometric robustness (Chandrasekhar et al., 2015).
5. Algorithmic Procedures: Pseudocode and Implementation
LLMs (summarized algorithm):
- For each candidate attention head, compute the task-conditioned mean activation over demonstrations.
- For each head, measure AIE via causal patching on corrupted prompts.
- Select the top 2 heads by AIE.
- Sum their means to form 3 (the FV).
- At inference, inject 4 into the chosen layer's residual stream.
Vision (Fisher Vectors):
- Extract local descriptors (e.g., SIFT, post-PCA).
- Fit a GMM to the descriptors.
- Compute mean/variance gradients for each mixture; concatenate.
- Apply power and 5 normalizations to the resulting global vector.
Cosmology (Fréchet vectors):
- Given 6, compute multipole vectors for each 7.
- For each 8, minimize 9 to find 0, using grid-search and local descent on 1.
- Use FVs for further analysis (e.g., uniformity tests, anomaly localization).
6. Notable Applications and Impact
- LLMs: FV injection enables explicit, swap-in control over downstream reasoning and zero-shot task transfer, with interpretability for relational reasoning and analogies (Todd et al., 2023, Kang et al., 13 Jan 2026, Opiełka et al., 5 Mar 2025).
- Vision: FVs (Fisher Vectors) remain foundational for scalable content-based image retrieval and hybrid fusion with deep features (Chandrasekhar et al., 2015).
- Cosmic Microwave Background: FVs facilitate blind detection and localization of non-Gaussian features—improving sensitivity over classical multipole statistics and enhancing robustness in cosmological hypothesis testing (Rodrigues et al., 2024).
7. Limitations, Caveats, and Future Directions
- LLMs: FV extraction requires extensive ablation runs or gradient-based relevance computations (e.g., LRP). Current FV approaches may not capture high-level behaviors or broad stylistic attributes; they are best suited for well-localized, task-specific interventions (Pham et al., 3 Jun 2026, Brumley et al., 2024).
- Vision: FV dimensionality can be prohibitive for large-scale databases, but product quantization and early fusion with CNN features provide practical mitigation (Chandrasekhar et al., 2015).
- Cosmology: FV analyses are limited by anisotropic noise and foreground modeling; residual systematic uncertainties at high 2 remain challenging (Rodrigues et al., 2024).
- Methodological: The in-distribution effectiveness and out-of-distribution limitations of FVs call for further development of format-invariant and transferable representations (e.g., hybrid FV/CV approaches), as well as more principled head selection and distributed steering protocols (Pham et al., 3 Jun 2026, Opiełka et al., 25 Feb 2026).
- Theoretical: In deep transformers, the layered updating and concatenation of function vectors enable adaptive inference not possible in shallow models, expanding the class of learnable in-context algorithms (Raj et al., 15 Jun 2026).
FVs thus provide a powerful, interpretable, and modular mechanism for representing and manipulating structured information across major paradigms in statistical learning, computer vision, and cosmological data analysis.