Model capacity explanation for DC-AE-f64 scaling benefits
Ascertain whether the observation that larger diffusion transformer models benefit more from DC-AE-f64 than smaller models is explained by DC-AE-f64’s larger latent channel count relative to SD-VAE-f8, which requires greater model capacity to achieve optimal performance.
References
We conjecture it is because DC-AE-f64 has a larger latent channel number than SD-VAE-f8, thus needing more model capacity \citep{esser2024scaling}.
                — Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
                
                (2410.10733 - Chen et al., 14 Oct 2024) in Section 4.2 (Latent Diffusion Models), ImageNet 512×512 paragraph