Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On Understanding Attention-Based In-Context Learning for Categorical Data (2405.17248v2)

Published 27 May 2024 in stat.ML and cs.LG

Abstract: In-context learning based on attention models is examined for data with categorical outcomes, with inference in such models viewed from the perspective of functional gradient descent (GD). We develop a network composed of attention blocks, with each block employing a self-attention layer followed by a cross-attention layer, with associated skip connections. This model can exactly perform multi-step functional GD inference for in-context inference with categorical observations. We perform a theoretical analysis of this setup, generalizing many prior assumptions in this line of work, including the class of attention mechanisms for which it is appropriate. We demonstrate the framework empirically on synthetic data, image classification and language generation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Transformers learn to implement preconditioned gradient descent for in-context learning, 2023.
  2. Linear attention is (maybe) all you need (to understand transformer optimization), 2024.
  3. What learning algorithm is in-context learning? investigations with linear models. International Conference on Learning Representations, 2022.
  4. On the optimization of a synaptic learning rule. Neural Information Processing Systems, 2016.
  5. On the optimization of a synaptic learning rule. Optimality in Artificial and Biological Neural Networks, 1992.
  6. Language models are few-shot learners. CoRR, abs/2005.14165, 2020.
  7. C. Chen and O.L. Mangasarian. A class of smoothing functions for nonlinear and mixed complementarity problems. Computational Optimization and Applications, 1996.
  8. Transformers implement functional gradient descent to learn non-linear functions in context, 2024.
  9. ImageNet: A large-scale hierarchical image database. Conference on Computer Vision and Pattern Recognition, 2009.
  10. Model-agnostic meta-learning for fast adaptation of deep networks, 2017.
  11. D. Kingma and J. Ba. Adam: A method for stochastic optimization. In ICLR, 2015.
  12. D.P. Kingma and J. Ba. Adam: A method for stochastic optimization, 2017.
  13. One step of gradient descent is provably the optimal in-context learner with one layer of linear self-attention. arXiv:2307.03576, 2023.
  14. Transformers can do Bayesian inference. International Conference on Learning Representations, 2022.
  15. On first-order meta-learning algorithms, 2018.
  16. S. Ravi and H. Larochelle. Optimization as a model for few-shot learning. International Conference on Learning Representations, 2017.
  17. What can transformers learn in-context? a case study of simple function classes. Advances in Neural Information Processing Systems, 2022.
  18. Meta-learning with memory-augmented neural networks, 2016.
  19. Linear transformers are secretly fast weight programmers. International Conference on Machine Learning, 2021.
  20. J. Schmidhuber. Evolutionary principles in selfreferential learning. on learning how to learn. Diploma thesis, Institut f. Informatik, Tech. Univ. Munich, 1987.
  21. B. Schölkopf and A.J. Smola. Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press, 2002.
  22. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition, 2015.
  23. Attention is all you need, 2023.
  24. Transformers learn in-context by gradient descent, 2023.
  25. Trained transformers learn linear models in-context. arXiv:2306.09927, 2023.
  26. Hypertransformer: Model generation for supervised and semi-supervised few-shot learning. International Conference on Machine Learning, 2022.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com