Adequacy of hard-attention theory as an approximation of softmax expressivity
Determine whether the expressivity results established for Unique Hard Attention Transformers and Average Hard Attention Transformers—both with and without position embeddings—provide a good approximation of the expressive power of SoftMax Attention Transformers, for example via tight upper and lower bounds that relate the respective language classes.
References
It is still unclear whether the theory of expressivity of UHATs and AHATs provides a good approximation of the theory of expressivity of softmax transformers.
                — The Role of Logic and Automata in Understanding Transformers
                
                (2509.24024 - Lin et al., 28 Sep 2025) in Section 6, Limitations of UHATs and AHATs (Limitation 1: Soft attention vs. Hard attention)