Impact of subword vs. phoneme text inputs on PM-RoPE duration control
Determine how the choice between phoneme sequences and SentencePiece subword tokens as text inputs affects the duration-control effectiveness of Progress-Monitoring Rotary Position Embedding (PM-RoPE) in encoder-decoder codec language models such as T5Gemma-TTS, ideally via a controlled ablation that isolates this factor.
References
The effect of this phoneme-vs-subword choice on PM-RoPE's duration control effectiveness has not been ablated in this work and remains an open question for future investigation.
— T5Gemma-TTS Technical Report
(2604.01760 - Arata et al., 2 Apr 2026) in Related Work, Section 2.2 (Encoder-Decoder Architectures in TTS)