Are queries and keys always relevant? A case study on Transformer wave functions (2405.18874v1)

Published 29 May 2024 in cond-mat.dis-nn, cs.CL, and physics.comp-ph

Abstract: The dot product attention mechanism, originally designed for NLP tasks, is a cornerstone of modern Transformers. It adeptly captures semantic relationships between word pairs in sentences by computing a similarity overlap between queries and keys. In this work, we explore the suitability of Transformers, focusing on their attention mechanisms, in the specific domain of the parametrization of variational wave functions to approximate ground states of quantum many-body spin Hamiltonians. Specifically, we perform numerical simulations on the two-dimensional $J_1$-$J_2$ Heisenberg model, a common benchmark in the field of quantum-many body systems on lattice. By comparing the performance of standard attention mechanisms with a simplified version that excludes queries and keys, relying solely on positions, we achieve competitive results while reducing computational cost and parameter usage. Furthermore, through the analysis of the attention maps generated by standard attention mechanisms, we show that the attention weights become effectively input-independent at the end of the optimization. We support the numerical results with analytical calculations, providing physical insights of why queries and keys should be, in principle, omitted from the attention mechanism when studying large systems. Interestingly, the same arguments can be extended to the NLP domain, in the limit of long input sentences.

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

References (61)

Authors (2)

Riccardo Rende (10 papers)
Luciano Loris Viteritti (8 papers)

Citations (2)

View on Semantic Scholar

Tweets

https://twitter.com/LFUS/status/1796039647455232426

Are queries and keys always relevant? A case study on Transformer wave functions (2405.18874v1)

Related Papers

Tweets