Papers
Topics
Authors
Recent
2000 character limit reached

Small transformer architectures for task switching (2508.04461v1)

Published 6 Aug 2025 in cs.LG and cs.AI

Abstract: The rapid progress seen in terms of large-scale generative AI is largely based on the attention mechanism. It is conversely non-trivial to conceive small-scale applications for which attention-based architectures outperform traditional approaches, such as multi-layer perceptrons or recurrent networks. We examine this problem in the context of 'task switching'. In this framework models work on ongoing token sequences with the current task being determined by stochastically interspersed control tokens. We show that standard transformers cannot solve a basic task switching reference model based on finite domain arithmetics which contains subtasks dedicated to increment / addition / reverse copy / context (IARC). We show that transformers, long short-term memory recurrent networks (LSTM), and plain multi-layer perceptrons (MLPs) achieve similar, but only modest prediction accuracies. We enlarge our comparative study by including an extension of the standard transformer architecture to its non-translational invariant counterpart, the cisformer, and an alternative attention mechanism, extensive attention. A combination of the latter is found to be the only model able to achieve considerable performance levels, of around 95%. Our results indicate that the workings of attention can be understood better, and even improved, when comparing qualitatively different formulations in task-switching settings.

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.