Intention behind Tulu-3 learning the phrase “I hope it is correct”
Ascertain whether the inclusion of the phrase “I hope it is correct” in assistant responses within the tulu-3-sft-mixture supervised fine-tuning dataset was intended to be learned by the finetuned model Tulu-3 (finetuned from Llama-3.1-8B), specifically in contexts where prompts contain mathematics, lists, or LaTeX formatting, to determine whether the observed prompt–response correlation reflects an intended training objective or an unintended artifact of dataset construction.
Sponsor
References
Examining the original dataset construction paper, this was indeed a formatting instruction given to the dataset-generating model, although whether it was intended that Tulu learn this behavior is unclear.
— Interpretable Embeddings with Sparse Autoencoders: A Data Analysis Toolkit
(2512.10092 - Jiang et al., 10 Dec 2025) in Section 6.2, Case Studies: Debugging Tulu-3’s post-training dataset