2000 character limit reached
Counting Like Transformers: Compiling Temporal Counting Logic Into Softmax Transformers (2404.04393v2)
Published 5 Apr 2024 in cs.LO, cs.CL, cs.FL, and cs.LG
Abstract: Deriving formal bounds on the expressivity of transformers, as well as studying transformers that are constructed to implement known algorithms, are both effective methods for better understanding the computational power of transformers. Towards both ends, we introduce the temporal counting logic $\textsf{K}\text{t}$[#] alongside the RASP variant $\textsf{C-RASP}$. We show they are equivalent to each other, and that together they are the best-known lower bound on the formal expressivity of future-masked soft attention transformers with unbounded input size. We prove this by showing all $\textsf{K}\text{t}$[#] formulas can be compiled into these transformers.