Linear Centroids Hypothesis (LCH)

Updated 3 July 2026

Linear Centroids Hypothesis is a framework that posits centroids—summary vectors derived from Jacobian and geometric methods—define feature organization and linear separability in complex systems.
Spectral diagnostics with gradient-based SED reveal that low-rank subspaces, where centroids concentrate, critically determine the separability and interpretability of neural representations.
The methodology enables efficient centroid extraction, sparse feature dictionary learning, and saliency mapping, linking deep learning interpretability with discrete geometric insights.

The Linear Centroids Hypothesis (LCH) asserts that the geometric and algebraic structure of centroids—specific summary vectors derived from neural, geometric, or combinatorial systems—fundamentally governs feature organization, linear separability, and, in certain contexts, the dynamic behavior of iterative processes. LCH has emerged as a unifying concept in deep learning interpretability, theory of neural manifolds, and discrete geometry, providing rigorous tools for characterizing both statistical and mechanistic aspects of representation and function.

1. Mathematical Formulation and Definitions

In deep networks, the centroid associated with an input $x$ (for a differentiable mapping $f: \mathbb{R}^d \to \mathbb{R}^m$ ) is given by the Jacobian-based vector

$c(x) = J_f(x)^{\top}\mathbf{1}_m = \sum_{i=1}^m \frac{\partial f_i(x)}{\partial x} \in \mathbb{R}^d,$

where $J_f(x) \in \mathbb{R}^{m\times d}$ is the input-output Jacobian and $\mathbf{1}_m$ is the all-ones vector. In networks with piecewise-affine structure, this captures the local linear behavior; in geometric or combinatorial settings, centroids reduce to area- or mean-based vectors (e.g., the centroid of a polygon).

The Linear Centroids Hypothesis posits that deep network “features” correspond not to arbitrary latent directions but to linear directions of centroids—i.e., low-rank subspaces within the set $\{c(x)\}$ for inputs $x$ sharing a semantic property. Formally, for each hypothesized feature $j$ , there exists $w_j$ such that for centroids $\{c_i\}_{i\in S_j}$ ,

$f: \mathbb{R}^d \to \mathbb{R}^m$ 0

Different features correspond to (approximately) orthogonal directions $f: \mathbb{R}^d \to \mathbb{R}^m$ 1 (Walker et al., 13 Apr 2026).

2. Theoretical Foundations in Neural Manifolds

Wakhloo et al. (Wakhloo et al., 2022) formalize LCH in the context of the linear separability of neural manifolds. For a collection of $f: \mathbb{R}^d \to \mathbb{R}^m$ 2 class manifolds $f: \mathbb{R}^d \to \mathbb{R}^m$ 3, each can be parametrized as

$f: \mathbb{R}^d \to \mathbb{R}^m$ 4

with centroid $f: \mathbb{R}^d \to \mathbb{R}^m$ 5 and $f: \mathbb{R}^d \to \mathbb{R}^m$ 6 principal axes $f: \mathbb{R}^d \to \mathbb{R}^m$ 7. The foundational replica-theoretic result gives the capacity for linear separation

$f: \mathbb{R}^d \to \mathbb{R}^m$ 8

where $f: \mathbb{R}^d \to \mathbb{R}^m$ 9 encodes the covariance of centroids and axes. Critically, centroid correlations directly control the effective separability of manifolds, while axis correlations are secondary, simply reducing effective radius. In the limit of maximally-correlated centroids, the capacity collapses irrespective of axis geometry; thus, the geometry and arrangement of centroids dominate the capacity of linear classifiers, sharpening and vindicating the LCH in the context of high-dimensional classification (Wakhloo et al., 2022).

3. Spectral Diagnostics and SED–LCH Coupling

Gradient-Direction Sensitivity (Xu, 28 Apr 2026) develops a spectral diagnostic based on Spectral Eigen-Directions (SED)—low-rank bases extracted via rolling SVDs on either optimizer updates or raw loss gradients. The strength of coupling between SED directions and centroid features is quantified by

$c(x) = J_f(x)^{\top}\mathbf{1}_m = \sum_{i=1}^m \frac{\partial f_i(x)}{\partial x} \in \mathbb{R}^d,$ 0

where $c(x) = J_f(x)^{\top}\mathbf{1}_m = \sum_{i=1}^m \frac{\partial f_i(x)}{\partial x} \in \mathbb{R}^d,$ 1 is the mean squared response of centroids to perturbations along direction $c(x) = J_f(x)^{\top}\mathbf{1}_m = \sum_{i=1}^m \frac{\partial f_i(x)}{\partial x} \in \mathbb{R}^d,$ 2 at time $c(x) = J_f(x)^{\top}\mathbf{1}_m = \sum_{i=1}^m \frac{\partial f_i(x)}{\partial x} \in \mathbb{R}^d,$ 3. On single-task modular arithmetic neural systems, gradient-based SED yields $c(x) = J_f(x)^{\top}\mathbf{1}_m = \sum_{i=1}^m \frac{\partial f_i(x)}{\partial x} \in \mathbb{R}^d,$ 4– $c(x) = J_f(x)^{\top}\mathbf{1}_m = \sum_{i=1}^m \frac{\partial f_i(x)}{\partial x} \in \mathbb{R}^d,$ 5, versus only $c(x) = J_f(x)^{\top}\mathbf{1}_m = \sum_{i=1}^m \frac{\partial f_i(x)}{\partial x} \in \mathbb{R}^d,$ 6– $c(x) = J_f(x)^{\top}\mathbf{1}_m = \sum_{i=1}^m \frac{\partial f_i(x)}{\partial x} \in \mathbb{R}^d,$ 7 for update-based SED. In multitask systems, only per-task gradient SED recovers strong LCH coupling ( $c(x) = J_f(x)^{\top}\mathbf{1}_m = \sum_{i=1}^m \frac{\partial f_i(x)}{\partial x} \in \mathbb{R}^d,$ 8– $c(x) = J_f(x)^{\top}\mathbf{1}_m = \sum_{i=1}^m \frac{\partial f_i(x)}{\partial x} \in \mathbb{R}^d,$ 9); aggregated gradients or update SEDs fail, due to optimizer-induced contamination (momentum/adaptive scaling) and competing task gradients.

The findings demonstrate that LCH features concentrate in the low-rank subspace of the instantaneous or per-task gradient, while optimizer trajectories may mask or distribute these directions (Xu, 28 Apr 2026).

4. Causal and Algorithmic Implications

Causal intervention experiments (Xu, 28 Apr 2026) show that constraining attention updates to any rank-3 subspace—SED-derived or random—accelerates grokking in transformers by approximately $J_f(x) \in \mathbb{R}^{m\times d}$ 0. However, removing the SED subspace has negligible effect, indicating that these directions are not uniquely causal; any low-rank constraint suffices for acceleration, and functional redundancy is high under typical hyperparameters (AdamW with $J_f(x) \in \mathbb{R}^{m\times d}$ 1, weight decay $J_f(x) \in \mathbb{R}^{m\times d}$ 2, lr $J_f(x) \in \mathbb{R}^{m\times d}$ 3). Thus, while low-rank SED-LCH alignment is a faithful diagnostic of feature-formation subspaces, it does not imply the necessity of these directions for functional performance.

5. Practical Methodology: Algorithms and Implementation

The LCH framework provides concrete procedures for extracting and leveraging centroid features (Walker et al., 13 Apr 2026):

Centroid Extraction: Efficiently computed as Jacobian-vector products in modern autodiff frameworks (single reverse-mode pass per input).
Sparse Feature Dictionary Learning: Sparse autoencoders (e.g., TopK lasso) trained on the matrix of centroids yield feature directions $J_f(x) \in \mathbb{R}^{m\times d}$ 4 with enforced sparsity.
Linear Probes and Circuit Discovery: Downstream tasks (e.g., classification) benefit from the sparsity and mechanistic grounding of centroid-based features. Centroid-based attributions identify functional units (e.g., neurons critical for GPT2-Large completion) with high sensitivity.
Saliency Mapping: Averaged local centroids in $J_f(x) \in \mathbb{R}^{m\times d}$ 5-balls produce faithful input saliency maps that correlate with model sensitivity.

The reference implementation is provided at https://github.com/ThomasWalker1/LinearCentroidsHypothesis (Walker et al., 13 Apr 2026).

6. Geometric and Combinatorial Manifestations

In discrete geometry, the LCH appears as an exact colinearity property for centroids in iterated midpoint-hexagon sequences (Tisdell, 6 Apr 2026). For any hexagon $J_f(x) \in \mathbb{R}^{m\times d}$ 6, repeated midpoint-mapping generates hexagons $J_f(x) \in \mathbb{R}^{m\times d}$ 7 whose filled centroids $J_f(x) \in \mathbb{R}^{m\times d}$ 8 all lie exactly along a fixed line $J_f(x) \in \mathbb{R}^{m\times d}$ 9 in the plane for $\mathbf{1}_m$ 0, moving monotonically toward the original vertex centroid. The explicit algebraic structure, via Fourier modes of the hexagon, characterizes this exact rigidity; such colinearity is unique to hexagons and fails for other polygonal sizes. This result stresses the algebraic power and subtlety of the centroid construct beyond neural representations.

7. Scope, Limitations, and Outlook

LCH provides a rigorous, mechanistically interpretable alternative to the earlier Linear Representation Hypothesis (LRH). By operating on centroids rather than raw activations, it achieves enhanced faithfulness, sparser dictionaries, and increased cross-model consistency (Walker et al., 13 Apr 2026). In theoretical settings, LCH remains robust under the presence of within-class (axis) variability, provided centroid covariances are controlled (Wakhloo et al., 2022). However, SED–LCH coupling, while highly diagnostic, is not uniquely causal: low-rank interventions that do not respect LCH subspaces yield equally strong functional performance in common hyperparameter regimes (Xu, 28 Apr 2026). In geometric contexts, centroid rigidity appears as a special phenomenon, e.g., for hexagons (Tisdell, 6 Apr 2026), not a generic property.

A plausible implication is that analyzing centroids—task-wise in multitask settings, and via direct gradients rather than optimizer updates—will remain essential for reliable interpretability and mechanistic probing of modern deep networks. The dominance of centroid geometry in classification theory further suggests that architectural or optimization schemes that organize and decorrelate centroids may enhance network expressivity and generalization.