What Formal Languages Can Transformers Express? A Survey (2311.00208v3)

Published 1 Nov 2023 in cs.LG, cs.CL, cs.FL, and cs.LO

Abstract: As transformers have gained prominence in natural language processing, some researchers have investigated theoretically what problems they can and cannot solve, by treating problems as formal languages. Exploring such questions can help clarify the power of transformers relative to other models of computation, their fundamental capabilities and limits, and the impact of architectural choices. Work in this subarea has made considerable progress in recent years. Here, we undertake a comprehensive survey of this work, documenting the diverse assumptions that underlie different results and providing a unified framework for harmonizing seemingly contradictory findings.

Citations (26)

View on Semantic Scholar

Summary

The paper unifies diverse theoretical findings to reveal transformers' ability to simulate automata and achieve Turing-complete operations with specific configurations.
It demonstrates lower bound results showing that augmented transformers can recognize complex formal languages using mechanisms like masked attention and counter logic.
The survey outlines upper bound constraints by relating transformer expressivity to constant-depth circuit classes and first-order logics, guiding future research.

Analyzing the Theoretical Expressivity of Transformer Models for Formal Languages

The paper "Transformers as Recognizers of Formal Languages: A Survey on Expressivity" presents a comprehensive survey of theoretical work investigating the expressive capabilities of transformer models in the context of formal languages. This work is grounded in the burgeoning interest of understanding the limitations and capabilities of transformers beyond their empirical prowess in natural language processing.

Overview of the Paper

The paper explores foundational research questions regarding the expressivity of transformer models relative to formal models like automata, circuits, and formal logics. Key questions include the comparison of transformer models against architectures such as RNNs, and the expressivity variations among different transformer variants.

The survey covers a wide array of findings, organized by lower bound results (capabilities of transformers) and upper bound results (limitations of transformers). One of the paper’s central themes is harmonizing diverse results across studies under a unified framework, allowing for a consistent understanding despite the disparate assumptions across research.

Key Findings

The paper highlights several important points, sorted by the nature of the results:

Lower Bounds:

Expressivity of Transformers vs. Automata: Papers such as that of Bhattamishra et al. have shown that soft-attention transformers can recognize languages like Shuffle-Dyck- $k$ when facilitative assumptions such as masked attention and counter logic-inspired operations are integrated.
Relation to Counter Machines: The work explores ties to $k$ -counter machines, offering a linkage between simplistic automata models and transformer design.
Turing Completeness: Variants of transformer models that allow for arbitrarily many computational steps in their decoder lend themselves to Turing completeness, akin to Turing machines.

Upper Bounds:

Circuit Complexity: Research demonstrates that transformer models, even with advanced attention mechanisms, remain within the confines of constant-depth circuit classes like $TC^0$ . This positions transformer architectures amid computational limits, distinctly away from more complex language classes such as the permuted languages or directed connectivity problems.
Logical Expressivity: There's a significant connection drawn between the expressivity of transformers and first-order logics, particularly in how counting mechanisms map to majority quantifiers in logical systems.

Implications and Future Directions

This survey's insights carry profound implications for both theoretical inquiry and practical applications of large-scale NLP models:

Practical Modeling: From a practical standpoint, insights into how transformers relate to circuit classes can have implications for designing better-aligned architectures for specific tasks like summarization that require nuanced counting and logical operations.
Theoretical Models and Further Research: The paper opens a path to further development of theoretical models or variants (like adding new attention mechanisms) based on formal insights to push the expressivity boundaries of transformers.
Educational Tool: The survey itself may serve as an educational primer on the intersection of formal language theory and deep learning, particularly within machine learning curricula targeting students with overlapping interests in computational theory and practical AI design.

In conclusion, "Transformers as Recognizers of Formal Languages" provides a profound exploration of the theoretical landscape defining the expressivity of transformers. It identifies not just limitations but also strategic areas where those limitations can inform future research and model development. As AI and NLP models continue to evolve, anchoring these developments in thorough theoretical grounds reveals both their potential and the scope for further advancement.

Related Papers

Tweets

https://twitter.com/mraginsky/status/1763058799663800532

https://twitter.com/_onionesque/status/1751671254363369919

https://twitter.com/sinya8282/status/1802561200460603408

https://twitter.com/FormalLanguages/status/1788088741334069334

https://twitter.com/jowenpetty/status/1827478134272438503

https://twitter.com/davidweichiang/status/1743511873591681419