Dice Question Streamline Icon: https://streamlinehq.com

Theoretical prediction of the ρ-dependent timing of truth-encoding emergence

Develop a theory that precisely predicts how the training time at which linear separability of true versus false contexts emerges depends on the true-attribute rate ρ in the synthetic training setup, including a quantitative relationship between ρ and the onset of linear truth encoding.

Information Square Streamline Icon: https://streamlinehq.com

Background

Empirically, the paper observes that as the true-attribute rate ρ increases, the onset of linear separability (truth encoding) occurs later in training, although it still emerges even for ρ as high as 0.999. This suggests a systematic ρ-dependent effect on learning dynamics.

A formal predictive theory for the timing of emergence would complement the empirical findings and provide a principled understanding of how distributional properties (captured by ρ) control when the truth subspace forms during training.

References

Developing a theory that precisely predicts this ρ-dependent timing is left to future work.

Emergence of Linear Truth Encodings in Language Models (2510.15804 - Ravfogel et al., 17 Oct 2025) in Appendix: Additional Experiments, subsection "Varying the true sentence rate, ρ"