Reason discriminator sensitivity bias impacts transformer-based codecs more than convolutional codecs
Investigate the cause of the observed systematic sensitivity bias in multi-resolution STFT discriminators that induces periodic artifacts, and ascertain why this bias affects the Transformer Audio AutoEncoder (TAAE) architecture, which is predominantly transformer-based, more strongly than prior convolutional codec architectures (e.g., SoundStream/Encodec-style CNNs).
References
A deeper examination of the reason why this bias effects a transformer-based architecture more than previous convolutional architectures is left to future work.
— Scaling Transformers for Low-Bitrate High-Quality Speech Coding
(2411.19842 - Parker et al., 29 Nov 2024) in Appendix: Systematic bias in loss functions