SDF performance on highly capable future models

Determine whether Synthetic Document Finetuning (SDF) remains effective at implanting deep, robust beliefs in future, highly capable language models, and characterize how its performance scales with model capability and increased inference-time compute.

Background

The authors test SDF’s robustness to increased inference-time reasoning on a mid-sized reasoning model and find minimal impact on implanted beliefs. However, they acknowledge uncertainty about how the method will fare as models become substantially more capable.

They provide preliminary evidence in the appendix suggesting positive scaling trends with model size and robustness to situational awareness, but emphasize that continued evaluation is needed for future, more capable systems.

References

While these scaling results are encouraging, it is unclear how SDF will perform on highly capable future systems. Appendix \ref{appendix:future_models} provides some evidence that SDF may scale favorably---effectiveness increases with model size and persists even when models know about the technique---though continued evaluation will be necessary.

— Believe It or Not: How Deeply do LLMs Believe Implanted Facts? (2510.17941 - Slocum et al., 20 Oct 2025) in Section 4.2, Subsubsection “SDF is robust to increased inference-time compute”

SDF performance on highly capable future models

Background

References

Related Problems