Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Logical Languages Accepted by Transformer Encoders with Hard Attention (2310.03817v1)

Published 5 Oct 2023 in cs.FL and cs.LG

Abstract: We contribute to the study of formal languages that can be recognized by transformer encoders. We focus on two self-attention mechanisms: (1) UHAT (Unique Hard Attention Transformers) and (2) AHAT (Average Hard Attention Transformers). UHAT encoders are known to recognize only languages inside the circuit complexity class ${\sf AC}0$, i.e., accepted by a family of poly-sized and depth-bounded boolean circuits with unbounded fan-ins. On the other hand, AHAT encoders can recognize languages outside ${\sf AC}0$), but their expressive power still lies within the bigger circuit complexity class ${\sf TC}0$, i.e., ${\sf AC}0$-circuits extended by majority gates. We first show a negative result that there is an ${\sf AC}0$-language that cannot be recognized by an UHAT encoder. On the positive side, we show that UHAT encoders can recognize a rich fragment of ${\sf AC}0$-languages, namely, all languages definable in first-order logic with arbitrary unary numerical predicates. This logic, includes, for example, all regular languages from ${\sf AC}0$. We then show that AHAT encoders can recognize all languages of our logic even when we enrich it with counting terms. We apply these results to derive new results on the expressive power of UHAT and AHAT up to permutation of letters (a.k.a. Parikh images).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. Ajtai, M. ∑11subscriptsuperscript11\sum^{1}_{1}∑ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-formulae on finite structures. Ann. Pure Appl. Log. 24, 1 (1983), 1–48.
  2. Anderton, H. A Mathematical Introduction to Logic, 2 ed. Academic Press, 2001.
  3. Regular languages in nc11{{}^{1}}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT. J. Comput. Syst. Sci. 44, 3 (1992), 478–499.
  4. First-order expressibility of languages with neutral letters or: The crane beach conjecture. J. Comput. Syst. Sci. 70, 2 (2005), 101–127.
  5. On the ability and limitations of transformers to recognize formal languages. In EMNLP (2020), pp. 7096–7116.
  6. Overcoming a theoretical limitation of self-attention. In ACL (2022), pp. 7654–7664.
  7. Tighter bounds on the expressivity of transformer encoders. In ICML (2023), vol. 202, PMLR, pp. 5544–5562.
  8. Model checking, 2nd Edition. MIT Press, 2018.
  9. Parity, circuits, and the polynomial-time hierarchy. In FOCS (1981), IEEE Computer Society, pp. 260–270.
  10. Haase, C. A survival guide to presburger arithmetic. ACM SIGLOG News 5, 3 (2018), 67–82.
  11. Hahn, M. Theoretical limitations of self-attention in neural sequence models. Trans. Assoc. Comput. Linguistics 8 (2020), 156–171.
  12. Formal language recognition by hard attention transformers: Perspectives from circuit complexity. Trans. Assoc. Comput. Linguistics 10 (2022), 800–810.
  13. Immerman, N. Descriptive complexity. Graduate texts in computer science. Springer, 1999.
  14. Kamp, H. W. Tense Logic and the Theory of Linear Order. PhD thesis, University of California, Los Angeles, 1968.
  15. Kozen, D. C. Automata and Computability. Springer, 1997.
  16. Parikh, R. On context-free languages. J. ACM 13, 4 (1966), 570–581.
  17. Attention is turing-complete. J. Mach. Learn. Res. 22 (2021), 75:1–75:35.
  18. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
  19. Attention is all you need. In NeurIPS (2017), pp. 5998–6008.
  20. Viola, E. On approximate majority and probabilistic time. Computational Complexity 18 (2009), 337–375.
  21. Thinking like transformers. In ICML (2021), vol. 139, pp. 11080–11090.
  22. Self-attention networks can process bounded hierarchical languages. In ACL/IJCNLP (2021), pp. 3770–3785.
Citations (11)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com