Unknown optimization dynamics of hash-conditioning
Characterize the optimization dynamics by which hash-conditioning affects Transformer training, including identifying why and under what conditions varying random hash prefixes does not disrupt optimization or generalization, and formalizing convergence behavior of hash-conditioned training.
References
Finally, there are also optimization aspects of how hash-conditioning works that we do not understand (see \S\ref{sec:noise}).
— Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
(2504.15266 - Nagarajan et al., 21 Apr 2025) in Section 6 (Discussion), Subsection: Intuition about hash-conditioning