Focused probing of under-resourced Indic and Niger-Congo languages and comparison with high-resource languages
Investigate morpho-syntactic probing results for under-resourced languages of the Indic and Niger-Congo families that were sparsely represented in the ROOTS pretraining corpus, and compare these results with high-resource languages to derive linguistic insights about performance disparities.
References
It should be noted that the following questions remain for further research: 3. Under-resourced language evaluation. The under-resourced languages of the Indic and Niger-Congo families included in the pretraining corpus in smaller shares represent a separate subject for future probing. We also plan to investigate the results of high-resourced and under-resourced languages to reveal possible linguistic insights in these two groups.
— BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
(2211.05100 - Workshop et al., 2022) in Section: Evaluation, Subsection: Multilingual Probing, Discussion