Do three-phase representation geometry dynamics appear beyond English and standard objectives?

Determine whether autoregressive transformer language models trained on multilingual corpora or with alternative training objectives (beyond standard cross-entropy pretraining) exhibit the same non-monotonic three-phase representation geometry sequence—Gray phase collapse, Maroon phase dimensional expansion coinciding with n-gram memorization, and BlueViolet phase anisotropic consolidation—observed in English-language models.

Background

The paper reports a consistent, non-monotonic sequence of three geometric phases during autoregressive pretraining in English-LLMs (OLMo and Pythia) measured via RankMe and eigenspectrum decay: Gray (rapid collapse), Maroon (dimensional expansion with increased n-gram memorization), and BlueViolet (anisotropic consolidation linked to long-context understanding).

However, the empirical analysis is restricted to English-LLMs and standard training objectives. The authors explicitly note that it remains unexplored whether similar phase dynamics occur in multilingual models or under alternative training objectives, motivating a systematic investigation across languages and training paradigms.

References

Our findings have several limitations: (i) computational constraints limited our analysis to models up to 12B parameters, though the phases persist across scales from 160M to 12B; (ii) spectral metric computation requires ∼10K samples and scales quadratically with hidden dimension (iii) our theoretical analysis assumes simplified linear feature extractors, leaving the extension to full transformer architectures as future work; (iv) we focused on English-LLMs trained with standard objectives, and whether similar phases emerge in multilingual or alternatively-trained models remains unexplored.

— Tracing the Representation Geometry of Language Models from Pretraining to Post-training (2509.23024 - Li et al., 27 Sep 2025) in Section 7 (Discussions), Limitations and Future Work

Do three-phase representation geometry dynamics appear beyond English and standard objectives?

Background

References

Related Problems