Generalizing MCDA to broader mel extraction pipelines beyond Fm and fmax

Generalize the multiple-condition-as-data-augmentation (MCDA) training strategy for RNDVoC to encompass broader mel-spectrogram extraction pipelines beyond varying the number of mel bands Fm and the upper-bound frequency fmax, including differing mel-filter formulations and normalization schemes, and determine how to maintain robust high-quality inference across such conditions within a single model.

Background

The proposed MCDA strategy converts multi-condition adaptation at inference into data augmentation during training by projecting mel-spectrograms from different configurations into a common linear-scale domain, enabling scalable inference under varying Fm and fmax.

However, mel extraction also varies by filter formula and normalization choices, and the authors recognize the need to extend MCDA to cover these broader factors while preserving the benefits of the RND framework.

References

Besides, the proposed MCDA strategy only considers two factors, i.e., Fm and fmax, and its generalization to more general mel-spectrogram extraction pipelines remains to be explored.

Scalable Neural Vocoder from Range-Null Space Decomposition  (2603.08574 - Li et al., 9 Mar 2026) in Section 6 (Concluding Remarks)