Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 99 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 40 tok/s
GPT-5 High 38 tok/s Pro
GPT-4o 101 tok/s
GPT OSS 120B 470 tok/s Pro
Kimi K2 161 tok/s Pro
2000 character limit reached

Folded Context Condensation in Path Integral Formalism for Infinite Context Transformers (2405.04620v5)

Published 7 May 2024 in hep-ph, cs.AI, cs.CL, cs.LG, and cs.NE

Abstract: In this work, we present a generalized formulation of the Transformer algorithm by reinterpreting its core mechanisms within the framework of Path Integral formalism. In this perspective, the attention mechanism is recast as a process that integrates all possible transition paths leading to future token states, with temporal evolution governed by the Feed-Forward Network. By systematically mapping each component of the Transformer to its counterpart in the Path Integral formulation, we obtain a more compact and efficient representation, in which the contextual information of a sequence is condensed into memory-like segments. These segments are recurrently processed across Transformer layers, enabling more effective long-term information retention. We validate the effectiveness of this approach through the Passkey retrieval task and a summarization task, demonstrating that the proposed method preserves historical information while exhibiting memory usage that scales linearly with sequence length. This contrasts with the non-linear memory growth typically observed in standard attention mechanisms. We expect that this quantum-inspired generalization of the Transformer architecture will open new avenues for enhancing both the efficiency and expressiveness of future Transformer models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (8)
  1. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin, “Attention is all you need,” Advances in neural information processing systems, 30, (2017)
  2. Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al, “Language models are few-shot learners,” Advances in neural information processing systems, 33:1877-1901, (2020)
  3. Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al, “Llama 2: Open foundation and fine-tuned chat models,” arXiv:2307.09288.
  4. Rohan Anil, Andrew M Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, et al, “Palm 2 technical report,” arXiv:2305.10403, 2023.
  5. Radford, A., Narasimhan, K., Salimans, T., and Sutskever, “Improving language understanding by generative pre-training.”
  6. Tsendsuren Munkhdalai, Manaal Faruqui and Siddharth Gopal, “Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention,” arXiv:2404.07143
  7. Zhuoran Shen, Mingyuan Zhang, Haiyu Zhao, Shuai Yi and Hongsheng Li, “Efficient Attention: Attention with Linear Complexities,” arXiv:1812.01243
  8. Srinivasan S. Iyengar, and Sabre Kais, “Analogy between Boltzmann machines and Feynman path integrals,” arXiv:2301.06217v1
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.