Adequacy of hard-attention theory as an approximation of softmax expressivity

Determine whether the expressivity results established for Unique Hard Attention Transformers and Average Hard Attention Transformers—both with and without position embeddings—provide a good approximation of the expressive power of SoftMax Attention Transformers, for example via tight upper and lower bounds that relate the respective language classes.

Background

Although hard-attention models (UHAT and AHAT) have yielded substantial theoretical progress, real-world transformers employ softmax attention. The paper raises concern about the gap between these theories and practical models.

Investigating whether the hard-attention theories approximate softmax expressivity would validate (or refute) the relevance of current formal results to practical transformers. Partial evidence exists (e.g., uniform AHAT layers approximating softmax behavior), but a comprehensive approximation guarantee is unknown.

References

It is still unclear whether the theory of expressivity of UHATs and AHATs provides a good approximation of the theory of expressivity of softmax transformers.

— The Role of Logic and Automata in Understanding Transformers (2509.24024 - Lin et al., 28 Sep 2025) in Section 6, Limitations of UHATs and AHATs (Limitation 1: Soft attention vs. Hard attention)

Adequacy of hard-attention theory as an approximation of softmax expressivity

Background

References

Related Problems