Line of Sight: Linear VLLMs

Updated 22 October 2025

The paper presents a framework showing that high-level semantic and visual concepts are encoded as one-dimensional subspaces in VLLM activation space.
Methodologies such as linear probing and causal steering validate the extraction and manipulation of meaningful concept directions.
Empirical evidence confirms that these linear representations aid in model interpretability, control, and effective cross-modal fusion.

The "line of sight" in the context of linear representations in vision-LLMs (VLLMs) refers to the principled paper and exploitation of structural axes in high-dimensional model representations where both visual and linguistic concepts are encoded as directions in neural activation space. This article examines the theoretical foundations, empirical methodologies, and practical implications of linear representations in VLLMs, synthesizing insights across computer vision, group-theoretic modeling, activation engineering, and contemporary multimodal architectures.

1. Theoretical Foundations of Linear Representations

Linear representations are a geometric hypothesis that interpretable, high-level concepts—semantic or visual—correspond to one-dimensional subspaces (directions) within a model's representation space. In computer vision, early work established that, under restrictive modeling assumptions, features maximally invariant to nuisance transformations (e.g., scaling, contrast, occlusion) can be formalized analytically as functions like $p_{x,G}(y) = p(\angle\nabla y | x)\cdot\|\nabla x\|$ , approximating minimal sufficient statistics and linking directly to descriptors such as SIFT and HOG (Soatto et al., 2014). When modeling transformations (e.g., SE(3) for 3D rigid-body motions), representation learning employing group theory ensures that representations transform consistently and linearly under group actions (Cohen et al., 2014). The linearity is guaranteed by decomposing representation spaces into sums of irreducible representations, whose statistical properties (block-diagonal covariance, decorrelation) facilitate robust downstream analysis.

In the context of LLMs, the "linear representation hypothesis" formalizes the observation that differences in embedding or unembedding vectors—such as those between counterfactual pairs ("king" vs. "queen")—align closely in representation space, forming canonical directions associated with semantic concepts (Park et al., 2023, Jiang et al., 6 Mar 2024). The mechanisms underpinning this phenomenon include the softmax cross-entropy objective's intrinsic forces, which favor log-odds matching along consistent directions, and the implicit bias of gradient descent toward max-margin solutions, resulting in strong parallelism among concept-defining vectors.

2. Methodologies for Probing and Constructing Linear Features

Probing for linear representations in VLLMs typically employs two main families of techniques:

Linear Probing: A linear classifier is trained on hidden activations extracted from fixed layers—often the residual stream of a transformer—using labels from a downstream task (e.g., ImageNet classification), with input features computed as layer-averaged activations. This establishes decodability: concepts are linearly separable via a single matrix multiplication (Rajaram et al., 5 Jun 2025).
Steering and Intervention: Causal analysis goes beyond detection by investigating whether adding a computed concept direction (steering vector) to the activations alters the model's output distribution as predicted. Contrastive Activation Addition (CAA) implements this by constructing a steering vector as difference between average activations for target and source classes and adding it to image token positions, resulting in measurable shifts in generated captions and outputs.

Extraction of concept directions leverages both counterfactual token pairs and more general activation differentials derived from context-rich or ambiguous prompts. The Sum of Activation-base Normalized Difference (SAND) method generalizes this process: activation differences, whitened and normalized, are aggregated using maximum likelihood estimation under a von Mises–Fisher (vMF) distribution, yielding robust unit-norm steering directions even in complex, context-dependent scenarios (Nguyen et al., 22 Feb 2025). Canonical mappings ( $\Psi$ ) between activation and whitened spaces allow flexible choice of metric, facilitating both raw Euclidean and covariance-driven (causal inner product) analyses.

Sparse Autoencoders (SAEs) extend these methods, training unsupervised models to reconstruct activations with an L₁ penalty, generating a dictionary of highly interpretable, monosemantic features—each corresponding to a distinctly decodable concept direction. SAEs provide a scalable approach to discovering diverse linear features in multimodal data streams (Rajaram et al., 5 Jun 2025).

3. Structural Properties and Cross-Modality Effects

A central finding is that both text and image concepts in VLLMs are encoded in highly structured, often near-orthogonal directions within hidden activation space. Early layers tend to maintain modality-specific subspaces, while deeper layers exhibit increasing sharing, as revealed by SAE ablation studies and cross-modality feature analysis (Rajaram et al., 5 Jun 2025). This progression suggests a hierarchical encoding where raw visual features are converted and fused with linguistic representations as computation proceeds, impacting behaviors such as captioning and multimodal reasoning.

The use of group theory provides a compelling explanation for structural properties: representations constructed as sums of irreducible group representations will be block-diagonal in their covariance, facilitating decorrelation and independent manipulation along different semantic axes (Cohen et al., 2014). The mathematical construction of these block representations—for example, via Wigner D-functions for SO(3) rotations—enables computational efficiency and theoretical guarantees of linear equivariance.

Moreover, empirical studies confirm that, given diverse linguistic or visual concepts, steering vectors not only align closely across instances of the same concept ( $\cos(\Delta_{c_1,i}, \Delta_{c_2,i}) \approx 1$ ), but also exhibit near-orthogonality for unrelated concepts, supporting interpretability and facilitating safe intervention (Jiang et al., 6 Mar 2024, Park et al., 2023). The form of the inner product employed—typically the causal inner product defined as $\langle \gamma, \gamma' \rangle_C := \gamma^\top \text{Cov}(\gamma)^{-1}\gamma'$ —is critical, as only such a metric ensures that representations of causally separable concepts remain orthogonal.

4. Analytical Implications for Occlusion and Invariance

Handling occlusion and "line-of-sight" effects remains foundational for robust linear vision representations. Analytical frameworks based on sampled orbit anti-aliased likelihoods, domain-size pooling, and marginalization over transformation groups yield features that are maximally invariant to nuisance variables, including scaling and partial observability (Soatto et al., 2014). These strategies inform preprocessing and feature selection for VLLMs tasked with bridging visual and linguistic modalities in scenarios where image data may be incomplete or ambiguous, as in visual question answering or captioning with occluded input.

Marginalizing over domain size and group transformations is not unique to vision; the latent variable modeling approach abstracts concept encoding into conditional distributions over binary latent concepts, showing that log-odds matching in softmax next-token prediction—and the resulting optimization landscapes—inevitably favor linear, decodable structure under both full and partial observability (Jiang et al., 6 Mar 2024).

5. Practical Applications, Activation Engineering, and Model Control

Linear representations—especially those accessible via steering vectors and decodable via linear probes—enable a variety of activation engineering practices:

Interpretability: Probing along concept directions offers introspection into how VLLMs encode and represent semantic and visual concepts. This facilitates debugging and checking for undesired correlations or biases.
Model Control and Intervention: Adding steering vectors during inference can reliably shift output probabilities toward or away from target concepts, forming the foundation for controlled behavior edits and bias mitigation.
Monitoring and Alignment: Experiments confirm that methods such as SAND outperform previous heuristics (mean-difference, PCA-based LAT) for monitoring and manipulating conceptual axes, with direct implications for safety, fairness, and regulatory compliance (Nguyen et al., 22 Feb 2025).
Cross-Modal Fusion and Fine-tuning: The evolving relationship between modality-specific and shared representations across layers directs fine-tuning strategies and informs architecture choices, especially as multimodal models expand to new tasks.

SAEs and other dictionary-based approaches further augment tools for model transparency by supplying explicit mappings between interpretable features and model activations. However, limits exist; while steering can reliably alter outputs along specific axes, the utility of linear features in adversarial defenses appears constrained (Rajaram et al., 5 Jun 2025).

6. Logical Structure, Reasoning Flows, and the Geometry of Multimodal Representations

Recent research re-frames reasoning within LLMs as "flows"—smooth trajectories in representation space—driven by logical controllers (deduction rules or proposition structures) (Zhou et al., 10 Oct 2025). Embedding trajectories can be decomposed into position, velocity, and curvature, allowing for formal analysis of how logical structure, disentangled from surface semantics, manifests within high-dimensional models. The consequences are twofold: first, logical controllers predict the evolution of representation flow regardless of topic or language; second, multimodal reasoning in VLLMs can be similarly modeled through differential geometric analysis of image–text encoding fusion.

This approach opens new avenues for formal interpretability, expanding the range of geometric quantities used to analyze VLLM internals and suggesting new mechanisms for multimodal regularization, consistency, and control.

7. Challenges and Future Directions

Despite advances, several controversies and open questions persist. The universality of the linear representation hypothesis—especially in context-dependent or ambiguous settings—is under active investigation. While SAND and related MLE-based techniques offer broader flexibility, limits to linear interpretability in adversarial or highly nonlinear contexts remain (Nguyen et al., 22 Feb 2025, Rajaram et al., 5 Jun 2025). Future research is directed toward understanding the conversion of raw modality information into shared representations, developing architectures that parameterize flows in latent space rather than static embeddings, and formalizing the geometric mechanisms underlying model fusion and reasoning.

A plausible implication is that new regularization and diagnostic tools—grounded in differential geometry and group-invariant statistics—may further enhance the fidelity, interpretability, and safety of next-generation VLLMs, especially in scenarios requiring robust handling of occlusion, ambiguous input, or compositional reasoning.

This synthesis draws upon results from (Soatto et al., 2014, Cohen et al., 2014, Park et al., 2023, Jiang et al., 6 Mar 2024, Nguyen et al., 22 Feb 2025, Rajaram et al., 5 Jun 2025), and (Zhou et al., 10 Oct 2025), which collectively establish the technical and practical landscape for line-of-sight analysis in linear representations for vision–LLMs.