On Understanding Attention-Based In-Context Learning for Categorical Data (2405.17248v2)

Published 27 May 2024 in stat.ML and cs.LG

Abstract: In-context learning based on attention models is examined for data with categorical outcomes, with inference in such models viewed from the perspective of functional gradient descent (GD). We develop a network composed of attention blocks, with each block employing a self-attention layer followed by a cross-attention layer, with associated skip connections. This model can exactly perform multi-step functional GD inference for in-context inference with categorical observations. We perform a theoretical analysis of this setup, generalizing many prior assumptions in this line of work, including the class of attention mechanisms for which it is appropriate. We demonstrate the framework empirically on synthetic data, image classification and language generation.

References (26)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/StatMLPapers/status/1795304982276686047

On Understanding Attention-Based In-Context Learning for Categorical Data (2405.17248v2)

Summary

Related Papers

Tweets