Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Associative Transformer (2309.12862v3)

Published 22 Sep 2023 in cs.LG, cs.CV, and cs.NE

Abstract: Emerging from the pairwise attention in conventional Transformers, there is a growing interest in sparse attention mechanisms that align more closely with localized, contextual learning in the biological brain. Existing studies such as the Coordination method employ iterative cross-attention mechanisms with a bottleneck to enable the sparse association of inputs. However, these methods are parameter inefficient and fail in more complex relational reasoning tasks. To this end, we propose Associative Transformer (AiT) to enhance the association among sparsely attended input patches, improving parameter efficiency and performance in relational reasoning tasks. AiT leverages a learnable explicit memory, comprised of various specialized priors, with a bottleneck attention to facilitate the extraction of diverse localized features. Moreover, we propose a novel associative memory-enabled patch reconstruction with a Hopfield energy function. The extensive experiments in four image classification tasks with three different sizes of AiT demonstrate that AiT requires significantly fewer parameters and attention layers while outperforming Vision Transformers and a broad range of sparse Transformers. Additionally, AiT establishes new SOTA performance in the Sort-of-CLEVR dataset, outperforming the previous Coordination method.

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com