Extending the RND vocoder beyond mel-spectrogram conditioning

Investigate the applicability of the range–null space decomposition framework for vocoding when conditioning on acoustic features other than mel-spectrograms, and determine appropriate inverse formulations or modifications when the linear mel-filter degradation assumption does not hold, so that the interpretability and orthogonality benefits of RND can be retained under alternative feature choices.

Background

RNDVoC relies on the fact that mel-spectrograms are obtained by a known linear mel-filtering of magnitude spectra, enabling a principled range-space projection via the pseudo-inverse and a complementary learned null-space component.

The authors note that for other acoustic features the same linear-degradation assumption may not apply, raising the need to reformulate or adapt the RND-based inverse mapping to maintain consistency and interpretability.

References

This requires further investigation for more general acoustic features, where the linear degradation formulation may no longer hold.

Scalable Neural Vocoder from Range-Null Space Decomposition  (2603.08574 - Li et al., 9 Mar 2026) in Section 6 (Concluding Remarks)