Systematicity of out-of-context learning in language models
Characterize the systematicity of out-of-context learning in large language models, defined as the phenomenon whereby training via gradient descent on distributional datasets that exhibit a behavior (e.g., positive-sentiment responses) leads the model to alter its natural-language descriptions to reflect that behavior. Establish when and how this generalization occurs and what regularities govern it.
References
Though the systematicity of this behavior is not yet understood, the ability to simply query a model in natural language for what it learned from a dataset could be useful.
— Neologism Learning for Controllability and Self-Verbalization
(2510.08506 - Hewitt et al., 9 Oct 2025) in Section 5 (Self-verbalization and machine-only synonyms), first paragraph