Reconciling KL regularization with reconstruction fidelity in RecTok

Ascertain training or architectural strategies for RecTok’s variational autoencoder that mitigate the reconstruction degradation induced by KL regularization while preserving the model’s intended behavior.

Background

RecTok employs a variational autoencoder with a KL loss to smooth the latent space, which the authors observe improves generation quality. However, this design choice introduces a trade-off: reconstruction quality is weakened compared to a deterministic autoencoder without KL, leading to worse reconstruction despite the same architecture.

The authors explicitly designate this KL–reconstruction conflict as an open question, suggesting that resolving it remains an outstanding problem for future work.

References

Regarding reconstruction, while the KL loss smooths the latent space and improves generation quality, it inevitably weakens reconstruction ability, resulting in RecTok performing worse than an AE model with the same architecture. We leave these challenges as open questions for future work.

RecTok: Reconstruction Distillation along Rectified Flow (2512.13421 - Shi et al., 15 Dec 2025) in Supplementary, Section: Limitations