Dice Question Streamline Icon: https://streamlinehq.com

Transferability of noise scheduling and loss re-weighting to high-dimensional semantic latents

Determine whether previously proposed noise scheduling and loss re-weighting strategies for diffusion models trained on pixel-space or variational autoencoder latents transfer effectively to high-dimensional semantic token latents produced by pretrained representation encoders in Representation Autoencoders, and ascertain any conditions under which such strategies require modification.

Information Square Streamline Icon: https://streamlinehq.com

Background

In adapting diffusion transformers to operate in the high-dimensional latent spaces produced by frozen representation encoders, the paper identifies several challenges responsible for initial training failures. Among these, prior noise scheduling and loss re-weighting methods were originally designed for pixel-space or low-dimensional VAE latents.

Because RAEs use high-dimensional semantic tokens, the applicability of these established scheduling and re-weighting techniques is uncertain. The authors highlight this uncertainty explicitly before introducing their dimension-dependent schedule shift as a solution, noting that the general transferability of earlier approaches to semantic latents has not been established.

References

Prior noise scheduling and loss re-weighting tricks are derived for image-based or VAE-based input, and it remains unclear if they transfer well to high-dimension semantic tokens.

Diffusion Transformers with Representation Autoencoders (2510.11690 - Zheng et al., 13 Oct 2025) in Section 4, "Taming Diffusion Transformers for RAEs" (hypotheses list under "DiT does not work out of the box")