Conjecture: Additional thought has little impact on most web text
Establish whether, for most chunks of general online text, additional thought—implemented as Quiet-STaR internal rationale tokens generated between observed tokens to explain future text—has little to no impact on improving a well-trained language model’s predictions of subsequent text.
Sponsor
References
Indeed, we conjecture that for most chunks of most online text, additional thought has little to no impact.
— Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
(2403.09629 - Zelikman et al., 14 Mar 2024) in Section 6, Experiments and Results