Extend the linear-theory analysis to full transformer architectures

Develop a theoretical analysis of gradient-descent optimization under cross-entropy loss for full transformer architectures that generalizes the simplified linear feature-extractor model (f_θ(s) = θ^T s with a linear classifier W) used to study representation geometry phase evolution.

Background

To explain the emergence of the observed geometric phases, the authors analyze an analytically tractable model with a linear feature extractor and linear classifier under cross-entropy optimization, deriving properties such as primacy bias and selection bias that qualitatively reproduce Gray, Maroon, and BlueViolet dynamics.

They explicitly acknowledge that this theoretical treatment assumes simplified linear feature extractors, leaving the extension to full transformer architectures as future work, which would establish whether the same mechanisms hold in realistic transformer networks.

References

Our findings have several limitations: (i) computational constraints limited our analysis to models up to 12B parameters, though the phases persist across scales from 160M to 12B; (ii) spectral metric computation requires ∼10K samples and scales quadratically with hidden dimension (iii) our theoretical analysis assumes simplified linear feature extractors, leaving the extension to full transformer architectures as future work; (iv) we focused on English-LLMs trained with standard objectives, and whether similar phases emerge in multilingual or alternatively-trained models remains unexplored.

— Tracing the Representation Geometry of Language Models from Pretraining to Post-training (2509.23024 - Li et al., 27 Sep 2025) in Section 7 (Discussions), Limitations and Future Work

Extend the linear-theory analysis to full transformer architectures

Background

References

Related Problems