Layer-wise and training-dynamics analysis of morpho-syntactic representations in BLOOM models
Analyze, across layers and over the course of pretraining, how morpho-syntactic representations are formed in the 176B-parameter BLOOM model and the 1.7B-parameter BLOOM-1B7 model to characterize representational development and property acquisition dynamics.
References
It should be noted that the following questions remain for further research: 4. Different layers and training dynamics. The analysis has focused on averaged representations of all layers and at the end of training. Analyzing different layers may reveal how morpho-syntactic representations are built during processing. Similarly, investigating how properties are acquired over the course of pre-training \citep{choshen-etal-2022-grammar,zhang-etal-2021-need,voloshina2022neural} is a viable direction for research.