Large Language Models are Locally Linear Mappings (2505.24293v2)

Published 30 May 2025 in cs.LG, cs.AI, and cs.CL

Abstract: We demonstrate that the inference operations of several open-weight LLMs can be mapped to an exactly equivalent linear system for an input sequence without modifying the model weights or altering output predictions. Extending techniques from image diffusion models that exhibit local or piecewise linearity, we strategically alter the gradient computation with respect to a given input sequence for a next-token prediction such that the Jacobian of the model nearly exactly reproduces the forward prediction with a linear system. We demonstrate this approach across models (Llama 3, Gemma 3, Qwen 3, Phi 4, Mistral Ministral and OLMo 2, up to Llama 3.3 70B Q4) and show through the singular value decomposition of the detached Jacobian that these LLMs operate in extremely low-dimensional subspaces where many of the largest singular vectors decode to concepts related to the most-likely output token. This approach also allows us to examine the operation of each successive layer (and its attention and MLP components) as nearly-exact linear systems and observe the emergence of semantic concepts. Despite their expressive power and global nonlinearity, modern LLMs can be interpreted through nearly-exact locally linear decompositions that provide insights into their internal representations and reveal interpretable semantic structures in the next-token prediction process.

Summary

Analysis of the Local Linearity in LLMs

The paper "LLMs are Locally Linear Mappings" by James R. Golden provides a rigorous examination of the computational structure of LLMs, asserting that these models can be interpreted as locally linear mappings over specified input sequences. The paper is grounded in the premise that, despite the inherent global nonlinearity of transformer architectures, their inference processes can be approximated effectively by linear systems for specific inputs without altering model weights or output predictions.

Methodological Approach

The author extends methodologies from image denoising and diffusion models, which demonstrate local linearity, to LLMs. By manipulating the gradient computation regarding input sequences for next-token prediction, the paper achieves a nearly exact reproduction of forward predictions with a linear system equivalent. Significant contributions of this work include:

Jacobian Transformation: The paper utilizes a detached Jacobian to derive a linear approximation of model operations. This entails detaching the gradient of certain non-linear components, such as SwiGLU activation and normalization layers, during inference.
Singular Value Decomposition (SVD): By applying SVD to the detached Jacobian, the paper identifies low-dimensional subspaces where the largest singular vectors correspond to concepts linked to the most likely output token.
Model Families and Sizes: The applicability of this linear approximation is verified across various transformer models, including Llama 3 to Mistral Ministral, spanning parameters up to 70 billion.

Implications and Insights

The exploration of local linearity in transformers offers several theoretical and practical insights:

Interpretability: Transforming the complex computations of LLMs into linear systems unlocks interpretability of semantic representations within these models. This approach can delineate how individual tokens and layers contribute to the final output predictions.
Efficiency: The methodology provides a pathway to efficiently analyze LLMs without retraining the models. The manipulation of gradients to transform nonlinear functions into linear systems offers an insightful analysis framework.
Model Steerability: The detached Jacobian also serves as a steering operator, thereby facilitating controlled manipulations of the model's output by adjusting intermediate layer activations, potentially allowing for bias detection and output refinement.

Future Directions

The findings open avenues for further exploration in AI model interpretability and optimization. Future research may explore:

General Applicability: Extending this linearity framework to other forms of neural networks or even hybrid models that combine different architectural paradigms.
Dynamic Input Sequences: Investigating how sequence length variations might affect the local linearity and whether similar methods can handle dynamic input scenarios.
Robustness Testing: Evaluating the robustness of these linear approximations under adversarial input perturbations or synthetic data interventions, ensuring the stability of predictions remains intact.

In conclusion, this paper advances the understanding of transformer operations by posing LLMs within a linear lens, offering new insights into their semantic structures and operational efficiencies. The ability to decompose and interpret LLMs through a near-exact linear framework presents a compelling narrative for both advancing AI capabilities and fostering their application across diverse tasks, with a special emphasis on model introspection and steerability.