Origin of hotspot-layer fractional depth differences across architectures

Ascertain whether the early-layer localisation of the introspection direction at approximately 6.25% of total depth in Llama 3.1 (Layers 2 and 5 in 8B and 70B respectively) versus 12.5% in Qwen 2.5-32B reflects architectural differences or genuinely different placement of the self-referential processing mechanism.

Background

The study reports spatial localisation of the introspection direction to early layers at consistent fractional depths within each architecture (6.25% in Llama 8B/70B; 12.5% in Qwen 2.5-32B). Adjacent layers show minimal effect, suggesting a dedicated mechanism rather than distributed behavior.

The authors explicitly state uncertainty about whether these differences arise from architectural design choices or indicate genuinely different locations of the same functional mechanism across models.

References

Whether this reflects architectural differences or a genuinely different placement remains an open question.

— When Models Examine Themselves: Vocabulary-Activation Correspondence in Self-Referential Processing (2602.11358 - Dadfar, 11 Feb 2026) in Section 6.4 Layer Localisation and the 3.0 Question

Origin of hotspot-layer fractional depth differences across architectures

Background

References

Related Problems