Probing BLOOM models on languages not included in pretraining to enable typological analysis
Investigate the probing performance of the 176B-parameter BLOOM model and the 1.7B-parameter BLOOM-1B7 model on languages not included in the ROOTS pretraining corpus to support typological interpretation and identify which linguistic features are most and least learnable across unseen languages.
References
It should be noted that the following questions remain for further research: 2. Multilingual abilities. A separate research interest implies considering languages that are not explicitly included in the pretraining corpus of the models. Expanding the set of languages for probing will allow for a typological interpretation and a deeper analysis of the most learnable and hard-to-learn linguistic features on a more considerable scope.
— BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
(2211.05100 - Workshop et al., 2022) in Section: Evaluation, Subsection: Multilingual Probing, Discussion