Dice Question Streamline Icon: https://streamlinehq.com

Expressivity of Softmax Transformers and relation to UHAT

Characterize the class of formal languages recognized by SoftMax Attention Transformers with position embeddings and determine whether this class subsumes the languages recognized by Unique Hard Attention Transformers with position embeddings.

Information Square Streamline Icon: https://streamlinehq.com

Background

Practical transformers use softmax attention, but most current theory focuses on hard-attention variants (UHAT and AHAT) due to analytic tractability. The paper notes that softmax transformers can capture PARITY (which UHATs cannot), indicating that softmax is strictly more expressive than UHAT in at least some cases.

The precise placement of softmax transformers within classical language classes, and their relationship to UHAT, remains unresolved. Establishing whether softmax subsumes UHAT—and more generally identifying the exact expressivity of softmax—would significantly sharpen the theoretical map of transformer capabilities.

References

For example, we do not know where the expressivity of softmax transformers exactly lies (e.g. do they subsume UHATs?).

The Role of Logic and Automata in Understanding Transformers (2509.24024 - Lin et al., 28 Sep 2025) in Section 6, Limitations of UHATs and AHATs (Limitation 1: Soft attention vs. Hard attention)