Transferability of noise scheduling and loss re-weighting to high-dimensional semantic latents
Determine whether previously proposed noise scheduling and loss re-weighting strategies for diffusion models trained on pixel-space or variational autoencoder latents transfer effectively to high-dimensional semantic token latents produced by pretrained representation encoders in Representation Autoencoders, and ascertain any conditions under which such strategies require modification.
References
Prior noise scheduling and loss re-weighting tricks are derived for image-based or VAE-based input, and it remains unclear if they transfer well to high-dimension semantic tokens.
— Diffusion Transformers with Representation Autoencoders
(2510.11690 - Zheng et al., 13 Oct 2025) in Section 4, "Taming Diffusion Transformers for RAEs" (hypotheses list under "DiT does not work out of the box")