Generalization of candidate belief geometries across models and layers

Determine whether the candidate simplex-structured representations and associated barycentric predictive advantages identified in the residual stream at layer 20 of Gemma-2-9B generalize to other language model architectures and to other layers.

Background

The study’s real-model analyses focus on Gemma-2-9B at a single layer (layer 20), where several clusters exhibit candidate simplex geometry and barycentric predictive advantages over individual features. However, the scope is limited to one model and layer.

Understanding whether these findings are robust across different architectures (e.g., model families, sizes) and across different layers within models is necessary to assess the pervasiveness and functional significance of belief-like simplex structures in LLM representations.

References

All real-model results are from Gemma-2-9B, layer 20. Whether findings generalize to other models, architectures, or layers is unknown.

— Finding Belief Geometries with Sparse Autoencoders (2604.02685 - Levinson, 3 Apr 2026) in Subsection "Limitations" (Discussion)

Generalization of candidate belief geometries across models and layers

Background

References

Related Problems