Extending the RND vocoder beyond mel-spectrogram conditioning

Investigate the applicability of the range–null space decomposition framework for vocoding when conditioning on acoustic features other than mel-spectrograms, and determine appropriate inverse formulations or modifications when the linear mel-filter degradation assumption does not hold, so that the interpretability and orthogonality benefits of RND can be retained under alternative feature choices.

Background

RNDVoC relies on the fact that mel-spectrograms are obtained by a known linear mel-filtering of magnitude spectra, enabling a principled range-space projection via the pseudo-inverse and a complementary learned null-space component.

The authors note that for other acoustic features the same linear-degradation assumption may not apply, raising the need to reformulate or adapt the RND-based inverse mapping to maintain consistency and interpretability.

References

This requires further investigation for more general acoustic features, where the linear degradation formulation may no longer hold.

— Scalable Neural Vocoder from Range-Null Space Decomposition (2603.08574 - Li et al., 9 Mar 2026) in Section 6 (Concluding Remarks)

Extending the RND vocoder beyond mel-spectrogram conditioning

Background

References

Related Problems