Dice Question Streamline Icon: https://streamlinehq.com

Layer-wise and training-dynamics analysis of morpho-syntactic representations in BLOOM models

Analyze, across layers and over the course of pretraining, how morpho-syntactic representations are formed in the 176B-parameter BLOOM model and the 1.7B-parameter BLOOM-1B7 model to characterize representational development and property acquisition dynamics.

Information Square Streamline Icon: https://streamlinehq.com

Background

The reported probing results focus on averaged representations and end-of-training checkpoints. Prior work suggests that different layers and training stages can encode different types of information, implying that BLOOM models might exhibit developmental trajectories in representing morpho-syntactic features.

A systematic layer-wise and temporal analysis would clarify where and when specific linguistic properties emerge, providing insights into model interpretability and guiding architectural or training improvements.

References

It should be noted that the following questions remain for further research: 4. Different layers and training dynamics. The analysis has focused on averaged representations of all layers and at the end of training. Analyzing different layers may reveal how morpho-syntactic representations are built during processing. Similarly, investigating how properties are acquired over the course of pre-training \citep{choshen-etal-2022-grammar,zhang-etal-2021-need,voloshina2022neural} is a viable direction for research.

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model (2211.05100 - Workshop et al., 2022) in Section: Evaluation, Subsection: Multilingual Probing, Discussion