Analysis of discriminator’s learned focus on inaudible spectral regions during training
Analyze and characterize why, during late training, the multi-resolution STFT discriminator’s loss becomes dominated by extremely low-magnitude (inaudible) spectral regions when training the Transformer Audio AutoEncoder (TAAE), and develop principled methods to address this learned bias without degrading timbre or intelligibility.
Sponsor
References
A more involved analysis for addressing this issue is left to future work.
— Scaling Transformers for Low-Bitrate High-Quality Speech Coding
(2411.19842 - Parker et al., 29 Nov 2024) in Appendix: Systematic bias in loss functions, subsection Learned bias during training