Generalization of English-only findings about compression and performance
Ascertain the extent to which findings from English-only experiments on the relationship between tokenization compression and language model performance generalize to other languages, including typologically diverse and low-resource languages.
References
These experiments are limited to English, and it is thus unclear the extent to which they generalize to other languages.
— Explaining and Mitigating Crosslingual Tokenizer Inequities
(2510.21909 - Arnett et al., 24 Oct 2025) in Section 2.3 (Related Work — The Relationship Between Compression and Performance)