Cause of PI-induced timing degradation under RoPE scaling
Determine whether the slowdown of periodic sounds and temporal desynchronization observed when extending SoundReactor’s context window via Position Interpolation (PI) on Rotary Positional Embeddings (RoPE) is caused by PI’s scaling of positions that lowers the effective RoPE angular frequency. Characterize the mechanism by which RoPE frequency scaling impacts timing in interleaved, frame-aligned audio–visual token sequences and establish conditions under which NTK-aware interpolation or sliding-window attention preserve synchronization.
References
In addition to quantitative evaluation in Section~\ref{ssec:main_result}, the spectrogram visualization in Figure~\ref{fig:longgen_spec_main} shows that PI slows periodic sounds (e.g., footsteps) and harms temporal synchronization, while NTK and SWA preserve timing. We conjecture that this stems from how RoPE frequencies are scaled.