Mechanism of in‑context learning in Transformers

Characterize the mechanism by which Transformer-based Large Language Models perform in-context learning without parameter updates.

Background

The survey presents three complementary lines of inquiry into in-context learning (ICL): algorithmic constructions showing Transformers can implement optimization steps, representation-based views treating ICL as retrieval of latent structures, and empirical studies that probe behavior across tasks and scales.

Despite these advances, the authors note that the foundational mechanism enabling accurate task adaptation from demonstrations with frozen weights remains unresolved, motivating unified theoretical explanations.

References

Despite the good performance of the ICL capabilities, the mechanism of ICL still remains an open question.

— Beyond the Black Box: Theory and Mechanism of Large Language Models (2601.02907 - Gan et al., 6 Jan 2026) in Subsubsection In-Context Learning, Section 6: Inference Stage (Core Theories and Methods)

Mechanism of in‑context learning in Transformers

Sponsor

Background

References

Related Problems