Papers
Topics
Authors
Recent
Search
2000 character limit reached

Function Vectors (FVs)

Updated 3 July 2026
  • Function Vectors (FVs) are task-specific, compact representations that aggregate internal model activations into a global vector for effective summarization of complex data.
  • They enable precise control of model behavior by steering neural activations, enhancing interpretability and performance across language, vision, and cosmological applications.
  • Extraction methods vary from causal mediation analysis in transformers to GMM-based Fisher Vectors in computer vision and Fréchet minimization in cosmology, each tailored to domain-specific challenges.

A function vector (FV) is a task-specific, compact representation extracted from the internal activations of a machine learning model—classically a transformer, but also applicable in probabilistic geometry and computer vision. FVs serve distinct roles depending on context: as global image descriptors for visual retrieval (Fisher Vectors), as causally precise function encoders in large language and multimodal models, or, in cosmology, as Fréchet vectors minimizing variance among multipole vectors to summarize angular data on the sphere. Despite their different technical instantiations, all FVs enable compressed, controllable representations of complex, distributed information.

1. Mathematical Definition and Extraction Procedures

The mathematical formulation and extraction of FVs differ across fields but share the principle of aggregating local or intermedial features into one global or task-level vector.

a) Language and Multimodal Models

For a given task tt, FVs are derived by first identifying the key attention heads or components responsible for encoding the task function, typically using causal mediation analysis. Let aj(p)Rda_{\ell j}(p) \in \mathbb{R}^d be the activation of head jj in layer \ell for prompt pp; the task-conditional mean is

aˉjt=1PtpPtaj(p).\bar{a}^t_{\ell j} = \frac{1}{|P_t|} \sum_{p \in P_t} a_{\ell j}(p).

Heads with the highest causal influence, as measured by the Average Indirect Effect (AIE), are selected. The FV is formed as

vt=(,j)Aaˉjt,v_t = \sum_{(\ell,j)\in A} \bar a^t_{\ell j},

where AA indexes the top-KK heads. At inference, vtv_t is additively injected after a chosen layer to steer model function (Todd et al., 2023, Kang et al., 13 Jan 2026, Nadaf, 3 Apr 2026, Fu et al., 2 Oct 2025).

b) Computer Vision: Fisher Vectors

Given a set of local descriptors aj(p)Rda_{\ell j}(p) \in \mathbb{R}^d0 for an image, a Gaussian Mixture Model (GMM) is fitted; the FV is formed from the concatenated gradients of the log-likelihood with respect to the GMM parameters: aj(p)Rda_{\ell j}(p) \in \mathbb{R}^d1 where aj(p)Rda_{\ell j}(p) \in \mathbb{R}^d2 and aj(p)Rda_{\ell j}(p) \in \mathbb{R}^d3 are the mean- and variance-gradient components for the aj(p)Rda_{\ell j}(p) \in \mathbb{R}^d4th Gaussian (Chandrasekhar et al., 2015).

c) Geometric/Probabilistic Setting: Fréchet Vectors

Given a set aj(p)Rda_{\ell j}(p) \in \mathbb{R}^d5 of unit vectors on aj(p)Rda_{\ell j}(p) \in \mathbb{R}^d6, the Fréchet vector aj(p)Rda_{\ell j}(p) \in \mathbb{R}^d7 minimizes the mean squared geodesic (great-circle) distance: aj(p)Rda_{\ell j}(p) \in \mathbb{R}^d8 where aj(p)Rda_{\ell j}(p) \in \mathbb{R}^d9. The FV is defined as

jj0

This yields the "center of mass" on the manifold (Rodrigues et al., 2024).

2. Causal Role and Interpretability in Model Architectures

Transformers and State-Space Models

In transformers, FVs are localized to a small, causally indispensable subcircuit of attention heads in intermediate/later layers; injecting these vectors recovers targeted task performance, while ablation disrupts it. In state-space models (e.g., Mamba), FVs are also present but may exhibit different or more distributed mechanisms, as in Mamba2 (Wang et al., 27 Oct 2025).

Vision-LLMs

In LMMs, FVs are extracted from cross-attention head activations and can prime the model for relational reasoning tasks, operating analogously to language FVs but in a multimodal embedding (Fu et al., 2 Oct 2025).

Cosmological Data

In spherical CMB analysis, FVs as Fréchet vectors provide a dimensionally reduced, rotation-invariant probe of multipole structure, serving as sensitive diagnostics for isotropy and higher-order correlations (Rodrigues et al., 2024).

3. Empirical Properties and Application Domains

Domain Purpose of FV Extraction Method
LLMs/LMMs Steer in-context learning Causal mediation, AIE, LRP
Vision Global image descriptor Log-likelihood gradients (GMM)
CMB/Geometry Center-of-mass direction Min. Fréchet variance

LLMs:

Computer Vision:

  • Fisher Vectors remain highly competitive as global descriptors for image instance retrieval, particularly for tasks requiring geometric invariance (rotation/scale). Performance is affected by choice of sampling strategy (sparse/dense, single/multi-scale) and normalization (Chandrasekhar et al., 2015).

Cosmology:

  • FVs (Fréchet vectors) are effective at blind anomaly localization (e.g., CMB Cold Spot) and serve as more sensitive probes than multipole vectors for departures from Gaussianity and isotropy, detecting jj1 deviations in Planck data unless noise/foreground modeling is extremely precise (Rodrigues et al., 2024).

4. Invariance, Transferability, and Limitations

  • In LLMs, FVs are not invariant across formats: for the same underlying task, FVs extracted from open-ended versus multiple-choice prompts are nearly orthogonal, indicating encoding of both task and format. This limits their generalization out-of-distribution. By contrast, concept vectors (CVs), selected via representational similarity analysis for format invariance, generalize better but are less causally potent in-distribution (Opiełka et al., 25 Feb 2026, Opiełka et al., 5 Mar 2025).
  • FVs in machine translation tasks show partial language-agnosticity: FVs extracted in English→X transfer to unseen target languages and instruction-tuned variants, with strong causal effects on token rankings—yet perform less effectively at the sentence level than on words (Laiyk et al., 21 Apr 2026).
  • Steering by FVs can operate even when no "answer direction" is decodable in the unembedding; this dissociation highlights that FVs are computational instructions, not simple output pointers (Nadaf, 3 Apr 2026).
  • In computer vision, variants of FVs differ in their invariance properties and computational trade-offs; careful normalization and pooling augment geometric robustness (Chandrasekhar et al., 2015).

5. Algorithmic Procedures: Pseudocode and Implementation

LLMs (summarized algorithm):

  1. For each candidate attention head, compute the task-conditioned mean activation over demonstrations.
  2. For each head, measure AIE via causal patching on corrupted prompts.
  3. Select the top jj2 heads by AIE.
  4. Sum their means to form jj3 (the FV).
  5. At inference, inject jj4 into the chosen layer's residual stream.

Vision (Fisher Vectors):

  1. Extract local descriptors (e.g., SIFT, post-PCA).
  2. Fit a GMM to the descriptors.
  3. Compute mean/variance gradients for each mixture; concatenate.
  4. Apply power and jj5 normalizations to the resulting global vector.

Cosmology (Fréchet vectors):

  1. Given jj6, compute multipole vectors for each jj7.
  2. For each jj8, minimize jj9 to find \ell0, using grid-search and local descent on \ell1.
  3. Use FVs for further analysis (e.g., uniformity tests, anomaly localization).

6. Notable Applications and Impact

  • LLMs: FV injection enables explicit, swap-in control over downstream reasoning and zero-shot task transfer, with interpretability for relational reasoning and analogies (Todd et al., 2023, Kang et al., 13 Jan 2026, Opiełka et al., 5 Mar 2025).
  • Vision: FVs (Fisher Vectors) remain foundational for scalable content-based image retrieval and hybrid fusion with deep features (Chandrasekhar et al., 2015).
  • Cosmic Microwave Background: FVs facilitate blind detection and localization of non-Gaussian features—improving sensitivity over classical multipole statistics and enhancing robustness in cosmological hypothesis testing (Rodrigues et al., 2024).

7. Limitations, Caveats, and Future Directions

  • LLMs: FV extraction requires extensive ablation runs or gradient-based relevance computations (e.g., LRP). Current FV approaches may not capture high-level behaviors or broad stylistic attributes; they are best suited for well-localized, task-specific interventions (Pham et al., 3 Jun 2026, Brumley et al., 2024).
  • Vision: FV dimensionality can be prohibitive for large-scale databases, but product quantization and early fusion with CNN features provide practical mitigation (Chandrasekhar et al., 2015).
  • Cosmology: FV analyses are limited by anisotropic noise and foreground modeling; residual systematic uncertainties at high \ell2 remain challenging (Rodrigues et al., 2024).
  • Methodological: The in-distribution effectiveness and out-of-distribution limitations of FVs call for further development of format-invariant and transferable representations (e.g., hybrid FV/CV approaches), as well as more principled head selection and distributed steering protocols (Pham et al., 3 Jun 2026, Opiełka et al., 25 Feb 2026).
  • Theoretical: In deep transformers, the layered updating and concatenation of function vectors enable adaptive inference not possible in shallow models, expanding the class of learnable in-context algorithms (Raj et al., 15 Jun 2026).

FVs thus provide a powerful, interpretable, and modular mechanism for representing and manipulating structured information across major paradigms in statistical learning, computer vision, and cosmological data analysis.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Function Vectors (FVs).