Conjecture: Hash-conditioning enables latent plan-conditioned generation

Establish whether hash-conditioning in Transformers—training with random hash string prefixes and using novel hash prefixes at inference—induces a latent-plan-conditioned generation process in which a single random "leap of thought" z is selected before decoding and tokens are generated from the conditional distribution p(s | z), rather than by marginalizing over multiple latent plans via output-layer temperature sampling.

Background

The paper introduces hash-conditioning as an input-layer randomization method for Transformers, where random hash prefixes are used during training and novel hashes are used during inference. This approach is contrasted with temperature sampling from the output softmax, which requires marginalizing over diverse latent plans at the token level.

The authors argue that creative tasks benefit from sampling a single latent leap of thought and then generating tokens conditioned on that plan, and they conjecture that hash-conditioning operationalizes this mechanism.

References

We conjecture that hash-conditioning enables this conditioned token generation.

— Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction (2504.15266 - Nagarajan et al., 21 Apr 2025) in Section 1 (Introduction)

Conjecture: Hash-conditioning enables latent plan-conditioned generation

Background

References

Related Problems