Inter-layer influence on lower-level encodings in HuBERT-I2

Investigate whether the encoding of lower-level linguistic units (phones, syllables, and word forms) in the final layers of HuBERT iteration 2 models benefits from higher-level linguistic information represented in earlier layers.

Background

HuBERT-I2 exhibits joint encoding of multiple linguistic levels in higher layers but shows a drop-off for semantic and syntactic information in the final layer while lower-level units remain strongly represented.

Neuroscientific evidence suggests hierarchical, interactive processing in human speech perception; determining whether similar top-down influences operate within HuBERT-I2 could illuminate mechanisms of representation and guide architectural choices.

References

Whether the encoding of lower-level linguistic units in HuBERT-I2's final layers similarly benefits from higher-level information represented in earlier layers remains to be explored.

Tracking the emergence of linguistic structure in self-supervised models learning from speech  (2604.02043 - Kloots et al., 2 Apr 2026) in Discussion and Conclusions