Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

In-Context Exemplars as Clues to Retrieving from Large Associative Memory (2311.03498v2)

Published 6 Nov 2023 in cs.CL and cs.LG

Abstract: Recently, LLMs have made remarkable progress in natural language processing. The most representative ability of LLMs is in-context learning (ICL), which enables LLMs to learn patterns from in-context exemplars without training. The performance of ICL greatly depends on the exemplars used. However, how to choose exemplars remains unclear due to the lack of understanding of how in-context learning works. In this paper, we present a novel perspective on ICL by conceptualizing it as contextual retrieval from a model of associative memory. We establish a theoretical framework of ICL based on Hopfield Networks. Based on our framework, we look into how in-context exemplars influence the performance of ICL and propose more efficient active exemplar selection. Our study sheds new light on the mechanism of ICL by connecting it to memory retrieval, with potential implications for advancing the understanding of LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. A review on language models as knowledge bases. arXiv preprint arXiv:2204.06031, 2022.
  2. Anonymous. STanhop: Sparse tandem hopfield model for memory-enhanced time series prediction. In Submitted to The Twelfth International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=6iwg437CZs. under review.
  3. Prediction and memory: A predictive coding account. Progress in neurobiology, 192:101821, 2020.
  4. Rapid encoding of musical tones discovered in whole-brain connectivity. NeuroImage, 245:118735, 2021.
  5. Attention approximates sparse distributed memory. Advances in Neural Information Processing Systems, 34:15301–15315, 2021.
  6. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  7. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
  8. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021.
  9. The colorado richly annotated full text (craft) corpus: Multi-model annotation in the biomedical domain. 2017.
  10. Why can gpt learn in-context? language models secretly perform gradient descent as meta optimizers. arXiv preprint arXiv:2212.10559, 2022.
  11. Frustratingly short attention spans in neural language modeling. arXiv preprint arXiv:1702.04521, 2017.
  12. A survey for in-context learning. arXiv preprint arXiv:2301.00234, 2022.
  13. James Eric Eich. The cue-dependent nature of state-dependent retrieval. Memory & Cognition, 8:157–173, 1980.
  14. What can transformers learn in-context? a case study of simple function classes. Advances in Neural Information Processing Systems, 35:30583–30598, 2022.
  15. Transformer feed-forward layers are key-value memories. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2021.
  16. Context-dependent memory in two natural environments: On land and underwater. British Journal of psychology, 66(3):325–331, 1975.
  17. Demystifying prompts in language models via perplexity estimation. arXiv preprint arXiv:2212.04037, 2022.
  18. John J Hopfield. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences, 79(8):2554–2558, 1982.
  19. On sparse modern hopfield model. arXiv preprint arXiv:2309.12673, 2023.
  20. What disease does this patient have? a large-scale open domain question answering dataset from medical exams. Applied Sciences, 11(14):6421, 2021.
  21. Memory retrieval in response to partial cues requires nmda receptor-dependent neurotransmission in the medial prefrontal cortex. Neurobiology of learning and memory, 109:20–26, 2014.
  22. The medial prefrontal cortex is involved in spatial memory retrieval under partial-cue conditions. Journal of Neuroscience, 27(49):13567–13578, 2007.
  23. Can active memory replace attention? Advances in Neural Information Processing Systems, 29, 2016.
  24. Pentti Kanerva. Sparse distributed memory. MIT press, 1988.
  25. Dense associative memory for pattern recognition. Advances in neural information processing systems, 29, 2016.
  26. Ask me anything: Dynamic memory networks for natural language processing. In International conference on machine learning, pages 1378–1387. PMLR, 2016.
  27. A systematic investigation of commonsense knowledge in large language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2022.
  28. What makes good in-context examples for GPT-3? In Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures. Association for Computational Linguistics, 2022.
  29. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 1993.
  30. Locating and editing factual associations in GPT. Advances in Neural Information Processing Systems, 36, 2022.
  31. Universal hopfield networks: A general framework for single-shot associative memory models. In International Conference on Machine Learning, pages 15561–15583. PMLR, 2022.
  32. Rethinking the role of demonstrations: What makes in-context learning work? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11048–11064. Association for Computational Linguistics, December 2022.
  33. Extralist cuing and retrieval inhibition. Journal of Experimental Psychology: Learning, Memory, and Cognition, 8(2):89, 1982.
  34. Medmcqa: A large-scale multi-subject multi-choice dataset for medical domain question answering. In Conference on Health, Inference, and Learning, pages 248–260. PMLR, 2022.
  35. Jane Pan. What In-Context Learning “Learns” In-Context: Disentangling Task Recognition and Task Learning. PhD thesis, Princeton University, 2023.
  36. Language models as knowledge bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, 2019.
  37. Hopfield networks is all you need. arXiv preprint arXiv:2008.02217, 2020.
  38. Prefrontal cortex and episodic memory: Integrating findings from neuropsychology and functional brain imaging. The cognitive neuroscience of memory: Encoding and retrieval, 1:83, 2002.
  39. How much knowledge can you pack into the parameters of a language model? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2020.
  40. The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends® in Information Retrieval, 3(4):333–389, 2009.
  41. Toward optimal active learning through monte carlo estimation of error reduction. ICML, Williamstown, 2:441–448, 2001.
  42. Learning to retrieve prompts for in-context learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, July 2022.
  43. Infinite-ranged models of spin-glasses. Physical Review B, 17(11):4384–4403, 1978.
  44. Constrained language models yield few-shot semantic parsers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2021.
  45. Large language models encode clinical knowledge. arXiv preprint arXiv:2212.13138, 2022.
  46. Steven M Smith. Remembering in and out of context. Journal of Experimental Psychology: Human Learning and Memory, 5(5):460, 1979.
  47. An information-theoretic approach to prompt engineering without ground truth labels. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, May 2022.
  48. End-to-end memory networks. Advances in neural information processing systems, 28, 2015.
  49. Transformer memory as a differentiable search index. Advances in Neural Information Processing Systems, 35:21831–21843, 2022.
  50. Availability versus accessibility of information in memory for words. Journal of verbal learning and verbal behavior, 5(4):381–391, 1966.
  51. Encoding specificity and retrieval processes in episodic memory. Psychological review, 80(5):352, 1973.
  52. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  53. Transformers learn in-context by gradient descent. arXiv preprint arXiv:2212.07677, 2022.
  54. On the robustness of chatgpt: An adversarial and out-of-distribution perspective. arXiv preprint arXiv:2302.12095, 2023.
  55. A neural corpus indexer for document retrieval. Advances in Neural Information Processing Systems, 35:25600–25614, 2022.
  56. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903, 2022.
  57. An explanation of in-context learning as implicit bayesian inference. In International Conference on Learning Representations, 2022.
  58. Compositional exemplars for in-context learning. arXiv preprint arXiv:2302.05698, 2023.
  59. Ground-truth labels matter: A deeper look into input-label demonstrations. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022. Association for Computational Linguistics, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Jiachen Zhao (10 papers)
Citations (4)