When can transformers compositionally generalize in-context? (2407.12275v1)
Abstract: Many tasks can be composed from a few independent components. This gives rise to a combinatorial explosion of possible tasks, only some of which might be encountered during training. Under what circumstances can transformers compositionally generalize from a subset of tasks to all possible combinations of tasks that share similar components? Here we study a modular multitask setting that allows us to precisely control compositional structure in the data generation process. We present evidence that transformers learning in-context struggle to generalize compositionally on this task despite being in principle expressive enough to do so. Compositional generalization becomes possible only when introducing a bottleneck that enforces an explicit separation between task inference and task execution.
- Seijin Kobayashi (16 papers)
- Simon Schug (8 papers)
- Yassir Akram (7 papers)
- Florian Redhardt (1 paper)
- Johannes von Oswald (21 papers)
- Razvan Pascanu (138 papers)
- Guillaume Lajoie (58 papers)
- João Sacramento (27 papers)