Dice Question Streamline Icon: https://streamlinehq.com

Generalization differences between BLOOM and BLOOM-1B7 on morpho-syntactic probing, especially for under-resourced languages

Investigate whether the 176B-parameter BLOOM model or the 1.7B-parameter BLOOM-1B7 model better generalizes morpho-syntactic features across under-resourced languages, given that BLOOM-1B7 leads on average in morpho-syntactic feature classification while its stronger correlations with pretraining dataset size may indicate weaker generalization to under-resourced settings.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper conducts a multilingual probing paper across 17 languages and 38 morpho-syntactic features, comparing BLOOM (176B) and BLOOM-1B7 (1.7B). Probing uses <s>-pooled representations and logistic regression classifiers on UD datasets.

Results show BLOOM-1B7 generally outperforms BLOOM on average across languages for morpho-syntactic feature classification, but BLOOM appears more stable across languages. The authors point out that BLOOM-1B7’s stronger correlation with pretraining dataset size might imply poorer generalization to under-resourced languages compared to the larger model, motivating a deeper investigation into which model truly generalizes better for under-resourced settings.

References

It should be noted that the following questions remain for further research: 1. Generalizing abilities. BLOOM-1B7 is leading in the average performance of morphosyntactic feature classification for the languages in~\autoref{tab:bloom:probing}. The BLOOM results are lower, which can be interpreted as a worse grammatical generalization over the aforecited languages. However, the BLOOM-1B7's probing correlation results with factors like pretraining dataset size are more prominent, which makes it potentially less generalizing on the under-resourced languages than the bigger version.

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model (2211.05100 - Workshop et al., 2022) in Section: Evaluation, Subsection: Multilingual Probing, Discussion