Theoretical prediction of the ρ-dependent timing of truth-encoding emergence
Develop a theory that precisely predicts how the training time at which linear separability of true versus false contexts emerges depends on the true-attribute rate ρ in the synthetic training setup, including a quantitative relationship between ρ and the onset of linear truth encoding.
References
Developing a theory that precisely predicts this ρ-dependent timing is left to future work.
— Emergence of Linear Truth Encodings in Language Models
(2510.15804 - Ravfogel et al., 17 Oct 2025) in Appendix: Additional Experiments, subsection "Varying the true sentence rate, ρ"