More effective STFT phase estimation for T‑F–domain neural vocoders

Develop more effective phase estimation techniques for short-time Fourier transform (STFT) phase reconstruction in time–frequency–domain neural vocoders such as RNDVoC, explicitly addressing phase periodicity and principal-value wrapping in (−π,+π], to improve phase prediction quality given magnitude information derived from mel-spectrogram inputs.

Background

The paper proposes RNDVoC, a time–frequency–domain neural vocoder grounded in range–null space decomposition, which reconstructs target spectra by combining a range-space projection (via the pseudo-inverse of the mel-filter) with a learned null-space component that refines spectral details and estimates phase.

Although the method introduces an omnidirectional phase loss and demonstrates strong empirical performance, the authors emphasize that estimating phase remains difficult due to its periodic nature and wrapping around the principal value range, making phase prediction a persistent bottleneck in vocoder design.

References

Besides, as phase spectrum exhibits a periodic structure and causes the wrapping around the principal value range (-π, +π], it remains a challenging issue on how to estimate the phase more effectively.

Scalable Neural Vocoder from Range-Null Space Decomposition  (2603.08574 - Li et al., 9 Mar 2026) in Section 1 (Introduction)