Papers
Topics
Authors
Recent
2000 character limit reached

Analyzing limits for in-context learning (2502.03503v2)

Published 5 Feb 2025 in stat.ML, cs.AI, and cs.LG

Abstract: We examine limits of in-context learning (ICL) in transformer models trained from scratch, focusing on function approximation tasks as a controlled setting to uncover fundamental behaviors. While we show empirically that transformer models can generalize, approximating unseen classes of polynomial (non linear) functions, they cannot generalize beyond certain values. We provide both empirical and mathematical arguments explaining that these limitations stem from architectural components, namely layer normalization and the attention scoring function, softmax. Together, our findings reveal structural constraints on ICL that are often masked in more complex NLP tasks but that need to be understood to improve robustness and interpretability in transformer-based models.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.