Boolean RASP (B-RASP): Formal Transformer Model
- Boolean RASP (B-RASP) is a formal schema that defines transformer capabilities using straight-line Boolean operations and hard-attention lookups.
- It precisely characterizes star-free regular languages and FO-rational transductions by mapping logical formulas and automata constructs to transformer mechanisms.
- Extensions like B-RASP[pos] and S-RASP expand its expressiveness by incorporating position arithmetic and prefix-sum operations for advanced string transductions.
Boolean RASP (B-RASP) is a formal programming schema introduced to analytically relate the representational capabilities of masked hard-attention transformers and classical models in formal language theory. Positioned as a syntactic and operational intermediate between linear temporal logic (LTL), star-free regular languages, and transformer architectures, B-RASP enables precise characterization of what transformers with specific architectural restrictions can and cannot compute. The framework has been systematically developed and extended in two influential lines of research, each focusing on the recognition and transduction power of such neural models (Yang et al., 2023, Strobl et al., 2024).
1. Formal Definition and Syntax
A B-RASP program processes strings over a finite alphabet and manipulates Boolean or symbol vectors indexed by input positions. The construction is strictly straight-line, with no branching or looping: each subsequent vector depends only on previously defined vectors. There are two primary instruction families:
- Position-wise Boolean Combinations: Given Boolean vectors , one can define a new vector by applying any Boolean formula . For alphabet symbols , initial vectors are defined as iff .
- Hard-Attention Lookup: The key primitive simulates the masked hard attention mechanism in transformers. For position , and for a Boolean mask (restricting eligible —e.g., to strict past/future), a Boolean score predicate , a value predicate , and default , B-RASP defines:
else , where (Yang et al., 2023).
- Compressed Symbol Output (Transductions, Extended): Beyond recognition, (Strobl et al., 2024) introduces B-RASP programs with symbol output registers, possibly mapping to bounded-length substrings, thus enabling general order-preserving string-to-string transductions.
Semantically, the program is evaluated left-to-right over a string , instantiating each vector in order and determining acceptance or output according to the designated final register.
2. Expressive Power: Recognition and Transduction
B-RASP’s expressiveness is exactly characterized in both the recognition and transduction settings by major theorems:
- Recognition (Star-Free, LTL-Definable Languages): B-RASP captures precisely the class of star-free regular languages (those expressible without use of the Kleene star in regular expressions), equivalently the languages accepted by counter-free finite automata, and exactly those definable in LTL. Every LTL formula over atomic predicates and Boolean/temporal operators translates to a B-RASP program computing the same predicate at every position, and vice versa (Yang et al., 2023).
- Transduction (First-Order Rational, FO-Rat): In the sequence-to-sequence setting, B-RASP programs (with compressed-output registers) compute exactly the first-order rational transductions, i.e., string-to-string mappings definable by order-preserving first-order transducers. Specifically:
as established in (Strobl et al., 2024). Each such mapping can be decomposed into a pair of aperiodic two-state sequential transducers and implemented in B-RASP with strictly composed Boolean state registers and hard-attention operations.
3. Relationship to Transformers and Temporal Logic
B-RASP is designed as an intermediate language whose straight-line, attention-style semantics aligns closely with masked hard-attention transformer circuits, and which is logically equivalent to specific fragments of temporal logic. This establishes robust equivalence relations:
| Model/Language | Equivalence Class |
|---|---|
| B-RASP | Star-free languages, Counter-free DFA, LTL |
| Strict-mask hard-attention transformers | B-RASP |
| B-RASP with compressed output | FO-rational (order-preserving) transductions |
| Masked average-hard-attention transformers | FO-rational transductions |
Any B-RASP program of Boolean vectors can be simulated by a strict-mask hard-attention transformer with depth two—encoding Boolean vectors in real activations and using meticulously constructed self-attention score/value heads and shallow ReLU FFNs. Conversely, any such transformer (without or with finite-image position embeddings) can be emulated by a B-RASP program via bit-encoded activations and simulated Boolean-lookup sequences (Yang et al., 2023, Strobl et al., 2024).
4. Extensions: Position Arithmetic and Prefix Sum
Two principal extensions enhance B-RASP’s expressiveness:
- B-RASP[pos]: Introduces integer vectors, built-in position access , clipped register-wise addition/subtraction, comparisons, and attention predicates referencing positions. This extension characterizes the class of first-order regular functions—a strict superset of FO-rational. For example, the copy-first-half function, first of , is not FO-rational but can be implemented in B-RASPpos.
- S-RASP: Augments B-RASP with prefix-sum operations, permitting additional arithmetic over sequences. S-RASP precisely captures the class of first-order polyregular functions, such as squaring a string or producing marked squares of progressively longer prefixes.
These extensions respectively map to broader transduction classes and can be simulated by appropriately architected transformer variants (e.g., masked average-hard attention models).
5. Illustrative Program Examples
B-RASP enables concise, compositional programs for a range of formal language tasks:
| Example | Description | B-RASP Sketch |
|---|---|---|
| Ends-in- | Accepts strings ending in | ; accept iff |
| Exists- | Accepts if occurs anywhere | , ; accept iff |
| Dyck-1 Brackets (depth 2) | Recognize correctly nested pairs up to depth 2 | Uses matching lookup and local consistency checks |
| Rotate-right (transduction) | Implements right/leftmost symbol lookups with default/out-of-bounds symbols (Strobl et al., 2024) |
For more involved languages (e.g., the Dyck-1 bracket language of depth 2), B-RASP programs utilize sequences of attention lookups and Boolean consistency checks across positions. Detailed intermediate vectors and registers, as well as tabulated step semantics, are provided in §4.2 of (Yang et al., 2023).
6. Masking, Position Embeddings, and Depth Hierarchies
- Masking: B-RASP supports “strict masking” (positions can attend only to strict past or future, not themselves), which ensures expressiveness matches the star-free languages. Relaxing to non-strict masks (“usual” transformer masking, or ) collapses expressiveness to the stutter-invariant star-free languages (languages invariant under repeated symbols) (Yang et al., 2023).
- Position Embeddings: Assuming position embeddings with finite image, the equivalence between strict-mask hard-attention transformers, B-RASP with predicates on position embeddings , and LTL with added monadic predicates holds. Using rational sinusoidal embeddings collapses recognition to the regular languages; arbitrary finite-image PEs correspond to (LTL with all monadic numeric predicates).
- Depth and Hierarchy: The expressive power of B-RASP, LTL, and masked hard-attention transformers strictly increases with depth of attention/computation. For each , there exists , a star-free language expressible with depth $2k+1$ but not $2k$. Thus,
reflecting a true hierarchy within star-free languages (Yang et al., 2023).
- Complexity: The translation of arbitrary score predicates to forms depending only on may incur exponential blow-up; however, value predicates can be made unary without such cost.
7. Role as Theoretical Intermediary and Limitations
B-RASP provides a transparent, compositional, and mechanically simulatable language that sits between high-level logic specifications and low-level transformer circuits. This alignment enables:
- Direct translation of LTL formulas and automata-based constructions into B-RASP programs.
- Rigorous exhibition of transformer limitations and capabilities, particularly with respect to masking strategies, position encoding schemes, and network depth.
Notably, plain B-RASP does not support position arithmetic or general counting, and thus cannot express languages or transductions requiring these capabilities (e.g., global half-length tests), except when extended (B-RASP[pos], S-RASP) (Strobl et al., 2024).
The B-RASP model has become central in the formal analysis of deep sequence models, providing an effective bridge between symbolic automata theory and the operational semantics of contemporary neural architectures. For comprehensive program examples, normal-form lemmas, and proof details, see (Yang et al., 2023) for language recognition and (Strobl et al., 2024) for transduction results.