Open questions on mechanisms of hash-conditioning and distribution-level enforcement
Determine the mechanisms that explain why hash-conditioning produces improved diversity and originality in Transformers; develop a distribution-level formulation that enforces a noise-to-data mapping (analogous to VAE/GAN latent-to-data distributions) for hash-conditioning rather than pointwise assignments; and ascertain whether such a formulation yields greater improvements without harming optimization or generalization.
References
This raises the open questions of why hash-conditioning works in the first place — surprisingly, without breaking optimization or generalization — and whether there is a way to enforce it at distribution-level, and whether that can provide even greater improvements.
— Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
(2504.15266 - Nagarajan et al., 21 Apr 2025) in Appendix: Further discussion, Subsection: Style of noise-injection