Cause of Llama 3 70B’s superior SDF alignment

Determine whether the observed highest implanted-fact alignment from Synthetic Document Finetuning on Llama 3 70B is due to intrinsic ease of belief implantation in that model family or due to method-specific overfitting caused by iterating the SDF pipeline on that particular model.

Background

Across multiple model families (Llama, Gemma, Qwen), the authors observe that larger models generally show equal or stronger false-fact alignment under SDF. Notably, Llama 3 70B exhibits the strongest implanted-fact alignment in their evaluations.

They explicitly note uncertainty about whether this superiority reflects intrinsic properties of Llama 3 70B or results from pipeline iteration and optimization on that particular model, highlighting a need to disentangle model-specific ease from methodological bias.

References

Notably, we developed our SDF pipeline by iterating against Llama 3 70B, which exhibits the highest implanted fact alignment according to our metrics. It is unclear whether this is due to it being easier to implant facts in this model generally or because we iterated our method against this particular model.

— Believe It or Not: How Deeply do LLMs Believe Implanted Facts? (2510.17941 - Slocum et al., 20 Oct 2025) in Appendix, Section “Will SDF continue to work well on future models?”, Subsection “SDF is robust to increased model size”

Cause of Llama 3 70B’s superior SDF alignment

Sponsor

Background

References

Related Problems