Systematicity of out-of-context learning in language models

Characterize the systematicity of out-of-context learning in large language models, defined as the phenomenon whereby training via gradient descent on distributional datasets that exhibit a behavior (e.g., positive-sentiment responses) leads the model to alter its natural-language descriptions to reflect that behavior. Establish when and how this generalization occurs and what regularities govern it.

Background

The paper discusses a form of generalization known as out-of-context learning, where LLMs trained on datasets exhibiting a behavior (such as positive sentiment) subsequently change their probabilities for natural-language descriptions of that behavior. This connects to the paper's central theme of neologism learning and self-verbalization, where models can describe the meanings they associate with newly trained tokens.

In the self-verbalization section, the authors explicitly note that the systematicity of this behavior is not yet understood, highlighting a gap in understanding of the conditions and mechanisms underpinning out-of-context learning. This motivates further work to formalize and empirically delineate the phenomenon's regularities.

References

Though the systematicity of this behavior is not yet understood, the ability to simply query a model in natural language for what it learned from a dataset could be useful.

— Neologism Learning for Controllability and Self-Verbalization (2510.08506 - Hewitt et al., 9 Oct 2025) in Section 5 (Self-verbalization and machine-only synonyms), first paragraph

Systematicity of out-of-context learning in language models

Background

References

Related Problems