Cross-linguistic generalization of S3M linguistic encodings

Determine to what extent the encoding of linguistic information at multiple structural levels learned by self-supervised speech models generalizes across languages beyond English.

Background

Existing work predominantly evaluates models trained and tested on English, leaving unclear whether representational findings at different structural levels hold cross-linguistically.

Clarifying cross-linguistic generalization is essential for evaluating the universality of self-supervised speech model representations and for guiding multilingual or language-specific pretraining strategies.

References

Additionally, since most studies focus on the encoding of English linguistic information in models pre-trained on English speech recordings [with some notable exceptions:], it is an open question to what extent S3M encoding of linguistic information at various structural levels generalizes to other languages.

Tracking the emergence of linguistic structure in self-supervised models learning from speech  (2604.02043 - Kloots et al., 2 Apr 2026) in Section 2.1 (Related work: Layerwise hierarchies and linguistic structure in S3Ms)