Ascertain whether SoundStorm’s non-autoregressive semantic-to-acoustic token conversion harms acoustic diversity
Ascertain whether replacing the autoregressive semantic-to-acoustic token conversion in AudioLM with SoundStorm’s non-autoregressive Transformer negatively affects acoustic diversity of the generated audio.
References
However, it is unclear whether this change harms acoustic diversity of the generated audio.
— MAD Speech: Measures of Acoustic Diversity of Speech
(2404.10419 - Futeral et al., 2024) in Section 7.1 (Semantic-to-Acoustic Token Conversion: SoundStorm vs. AudioLM)