Quantum linear algebra is all you need for Transformer architectures (2402.16714v2)
Abstract: Generative machine learning methods such as large-LLMs are revolutionizing the creation of text and images. While these models are powerful they also harness a large amount of computational resources. The transformer is a key component in LLMs that aims to generate a suitable completion of a given partial sequence. In this work, we investigate transformer architectures under the lens of fault-tolerant quantum computing. The input model is one where trained weight matrices are given as block encodings and we construct the query, key, and value matrices for the transformer. We show how to prepare a block encoding of the self-attention matrix, with a new subroutine for the row-wise application of the softmax function. In addition, we combine quantum subroutines to construct important building blocks in the transformer, the residual connection and layer normalization, and the feed-forward neural network. Our subroutines prepare an amplitude encoding of the transformer output, which can be measured to obtain a prediction. Based on common open-source large-LLMs, we provide insights into the behavior of important parameters determining the run time of the quantum algorithm. We discuss the potential and challenges for obtaining a quantum advantage.
- OpenAI. GPT-4 technical report. arXiv:2303.08774, 2023.
- Sparks of artificial general intelligence: Early experiments with GPT-4. arXiv:2303.12712, 2023.
- A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT, 2023.
- On the opportunities and risks of foundation models. arXiv:2108.07258, 2022.
- Attention is All you Need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
- Neural machine translation by jointly learning to align and translate. arXiv:1409.0473, 2016.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805, 2019.
- Improving language understanding by generative pre-training. https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf, 2018.
- Language models are unsupervised multitask learners. https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf, 2019.
- Language models are few-shot learners. arXiv:2005.14165, 2020.
- Quantum Algorithm for Linear Systems of Equations. Physical Review Letters, 103(15):150502, October 2009.
- Quantum principal component analysis. Nature Physics, 10(9):631–633, September 2014.
- Quantum singular value transformation and beyond: Exponential improvements for quantum matrix arithmetics. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 193–204, June 2019.
- Optimal Hamiltonian Simulation by Quantum Signal Processing. Physical Review Letters, 118(1):010501, January 2017.
- Quantum eigenvalue processing. arXiv:2401.06240, 2024.
- Fault-tolerant control of an error-corrected qubit. Nature, 598(7880):281–286, October 2021.
- Google Quantum AI. Suppressing quantum errors by scaling a surface code logical qubit. Nature, 614(7949):676–681, February 2023.
- Logical quantum processor based on reconfigurable atom arrays. Nature, 626(7997):58–65, 2024.
- Quantum random access memory. Phys. Rev. Lett., 100:160501, Apr 2008.
- Qram: A survey and critique. arXiv:2305.10310, 2023.
- Hamiltonian Simulation by Qubitization. Quantum, 3:163, July 2019.
- Neural machine translation of rare words with subword units. In Katrin Erk and Noah A. Smith, editors, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1715–1725, Berlin, Germany, August 2016. Association for Computational Linguistics.
- SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Eduardo Blanco and Wei Lu, editors, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 66–71, Brussels, Belgium, November 2018. Association for Computational Linguistics.
- Between words and characters: A brief history of open-vocabulary modeling and tokenization in nlp. arXiv:2112.10508, 2021.
- Deep residual learning for image recognition. arXiv:1512.03385, 2015.
- Layer normalization. arXiv:1607.06450, 2016.
- Bridging nonlinearities and stochastic regularizers with gaussian error linear units. https://openreview.net/forum?id=Bk0MRI5lg, 2017.
- The Power of Block-Encoded Matrix Powers: Improved Regression Techniques via Faster Hamiltonian Simulation. In Christel Baier, Ioannis Chatzigiannakis, Paola Flocchini, and Stefano Leonardi, editors, 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019), volume 132 of Leibniz International Proceedings in Informatics (LIPIcs), pages 33:1–33:14, Dagstuhl, Germany, 2019. Schloss Dagstuhl – Leibniz-Zentrum für Informatik.
- Non-linear transformations of quantum amplitudes: Exponential improvement, generalization, and applications. arXiv:2309.09839, 2023.
- Quantum amplitude amplification and estimation. In Quantum Computation and Information (Washington, DC, 2000), volume 305 of Contemporary Mathematics, pages 53–74. American Mathematical Society, Providence, RI, 2002.
- Asymptotically Optimal Circuit Depth for Quantum State Preparation and General Unitary Synthesis. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 42(10):3301–3314, October 2023.
- Quantum State Preparation with Optimal Circuit Depth: Implementations and Applications. Physical Review Letters, 129(23):230504, November 2022.
- Compiling basic linear algebra subroutines for quantum computers. Quantum Machine Intelligence, 3(2):21, June 2021.
- Nonlinear transformation of complex amplitudes via quantum singular value transformation. arXiv:2107.10764, 2021.
- Tensor programs v: Tuning large neural networks via zero-shot hyperparameter transfer. arXiv:2203.03466, 2022.
- Guang Hao Low. Quantum Signal Processing by Single-Qubit Dynamics. Thesis, Massachusetts Institute of Technology, 2017.
- Quantum vision transformers. arXiv:2209.08167, 2024.
- Quantum self-attention neural networks for text classification. arXiv:2205.05625, 2023.
- Fast quantum algorithm for attention computation. arXiv:2307.08045, 2023.
- The dawn of quantum natural language processing. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8612–8616, 2022.
- Quantum Recommendation Systems. arXiv:1603.08675, September 2016.
- Efficient transformers: A survey. arXiv:2009.06732, 2022.
- Predicting Many Properties of a Quantum System from Very Few Measurements. Nature Physics, 16(10):1050–1057, October 2020.
- Learning shallow quantum circuits. arXiv:2401.10095, 2024.
- Shallow shadows: Expectation estimation using low-depth random clifford circuits. arXiv:2209.12924, 2023.
- Neural-network quantum state tomography. Nature Physics, 14(5):447–450, May 2018.
- Provable learning of quantum states with graphical models. arXiv:2309.09235, 2023.
- Quantum analog-digital conversion. Phys. Rev. A, 99:012301, Jan 2019.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.