Cross-relation generalization in TCH-instantiated natural language training

Determine whether the linear truth encoding and associated behavioral effects learned when training a transformer language model on paired CounterFact examples from a single relation (e.g., WorksIn, SpeaksLanguage, BornIn) generalize to different relations not seen during training, and characterize the extent and robustness of such cross-relation transfer.

Background

To test the Truth Co-occurrence Hypothesis (TCH) in natural language, the paper constructs paired examples from the CounterFact dataset by concatenating two instances sharing the same truth label and trains small transformers per relation. The experiments report rapid memorization followed by the emergence of a linear encoding and increased entropy on false sequences.

However, the training is restricted to one relation at a time, leaving open whether the learned truth representation or behavior transfers across relations. Assessing cross-relation generalization would clarify the scope of the mechanism beyond single-relation settings and determine whether a unified truth subspace spans multiple factual relations.

References

We leave the question of generalization between relations to a future work.

— Emergence of Linear Truth Encodings in Language Models (2510.15804 - Ravfogel et al., 17 Oct 2025) in Section 5.2, Subsubsection "Instantiating the TCH in Natural Language" (Setup)

Cross-relation generalization in TCH-instantiated natural language training

Background

References

Related Problems